#Assignment - Transliteration

In this task you are required to solve the transliteration problem of names from English to Russian. Transliteration of a string means writing this string using the alphabet of another language with the preservation of pronunciation, although not always.


## Instructions

To complete the assignment please do the following  steps (both are requred to get the full credits): 

###1. Complete this notebook

Upload a filled notebook with code (this file). You will be asked to implement a transformer-based approach for transliteration.

You should implement your ``train`` and ``classify`` functions in this notebook in the cells below. Your model should be implemented as a special class/function in this notebook (be sure if you add any outer dependencies that everything is improted correctly and can be reproducable). 


###2. Submit solution to the shared task

After the implementation of models' architectures you are asked to participate in the [competition](https://competitions.codalab.org/competitions/30932) to solve **Transliteration** task using your implemented code. 

You should use your code from the previous part to train, validate, and generate predictions for the public (Practice) and private (Evaluation) test sets. It will produce predictions (`preds_translit.tsv`) for the dataset and score them if the true answers are present. You can use these scores to evaluate your model on dev set and choose the best one. Be sure to download the [dataset](https://github.com/skoltech-nlp/filimdb_evaluation/blob/master/TRANSLIT.tar.gz) and unzip it with `wget` command and run them from notebook cells. 

Upload obtained TSV file with your predictions (``preds_translit.tsv``) in ``.zip`` for the best results to both phases of the competition.


**Important: You must indicate "DL4NLP-23" as your team name in Codalab. Without it your submission will be invalid!**


## Basic algorithm

The basic algorithm is based on the following idea: for transliteration, alphabetic n-grams from one language can be transformed into another language into n-grams of the same size, using the most frequent transformation rule found according to statistics on the training sample. 

To test the implementation, download the data, unzip the datasets, predict transliteration and run the evaluation script. To do this, you need to run the following commands:

In [None]:
!wget https://github.com/s-nlp/filimdb_evaluation/raw/master/TRANSLIT.tar.gz

--2023-04-11 07:32:46--  https://github.com/s-nlp/filimdb_evaluation/raw/master/TRANSLIT.tar.gz
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/s-nlp/filimdb_evaluation/master/TRANSLIT.tar.gz [following]
--2023-04-11 07:32:47--  https://raw.githubusercontent.com/s-nlp/filimdb_evaluation/master/TRANSLIT.tar.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1546458 (1.5M) [application/octet-stream]
Saving to: ‘TRANSLIT.tar.gz’


2023-04-11 07:32:47 (25.6 MB/s) - ‘TRANSLIT.tar.gz’ saved [1546458/1546458]



In [None]:
!gunzip TRANSLIT.tar.gz

In [None]:
!tar -xf TRANSLIT.tar

### Baseline code

In [None]:
from typing import List, Any
from random import random
import collections as col

def baseline_train(
        train_source_strings: List[str],
        train_target_strings: List[str]) -> Any:
    """
    Trains transliretation model on the given train set represented as
    parallel list of input strings and their transliteration via labels.
    :param train_source_strings: a list of strings, one str per example
    :param train_target_strings: a list of strings, one str per example
    :return: learnt parameters, or any object you like (it will be passed to the classify function)
    """

    ngram_lvl = 3
    def obtain_train_dicts(train_source_strings, train_target_strings,
                            ngram_lvl):
        ngrams_dict = col.defaultdict(lambda: col.defaultdict(int))
        for src_str,dst_str in zip(train_source_strings,
                                        train_target_strings):
            try:
                src_ngrams = [src_str[i:i+ngram_lvl] for i in
                                range(len(src_str)-ngram_lvl+1)]
                dst_ngrams = [dst_str[i:i+ngram_lvl] for i in
                                range(len(dst_str)-ngram_lvl+1)]
            except TypeError as e:
                print(src_ngrams, dst_ngrams)
                print(e)
                raise StopIteration
            for src_ngram in src_ngrams:
                for dst_ngram in dst_ngrams:
                    ngrams_dict[src_ngram][dst_ngram] += 1
        return ngrams_dict
        
    ngrams_dict = col.defaultdict(lambda: col.defaultdict(int))
    for nl in range(1, ngram_lvl+1):
        ngrams_dict.update(
            obtain_train_dicts(train_source_strings,
                            train_target_strings, nl))
    return ngrams_dict 


def baseline_classify(strings: List[str], params: Any) -> List[str]:
    """
    Classify strings given previously learnt parameters.
    :param strings: strings to classify
    :param params: parameters received from train function
    :return: list of lists of predicted transliterated strings
      (for each source string -> [top_1 prediction, .., top_k prediction]
        if it is possible to generate more than one, otherwise
        -> [prediction])
        corresponding to the given list of strings
    """
       
    def predict_one_sample(sample, train_dict, ngram_lvl=1):
        ngrams = [sample[i:i+ngram_lvl] for i in
 range(0,(len(sample) // ngram_lvl * ngram_lvl)-ngram_lvl+1, ngram_lvl)] +\
                 ([] if len(sample) % ngram_lvl == 0 else
                    [sample[-(len(sample) % ngram_lvl):]])
        prediction = ''
        for ngram in ngrams:
            ngram_dict = train_dict[ngram]
            if len(ngram_dict.keys()) == 0:
                prediction += '?'*len(ngram)
            else:
                prediction += max(ngram_dict, key=lambda k: ngram_dict[k])
        return prediction 
    
    ngram_lvl = 3
    predictions = []
    ngrams_dict = params
    for string in strings:
        top_1_pred = predict_one_sample(string, ngrams_dict,
                                                ngram_lvl)
        predictions.append([top_1_pred])
    return predictions

### Evaluation code

In [None]:
PREDS_FNAME = "preds_translit_baseline.tsv"
SCORED_PARTS = ('train', 'dev', 'train_small', 'dev_small', 'test')
TRANSLIT_PATH = "TRANSLIT"

In [None]:
import codecs
from pandas import read_csv

def load_dataset(data_dir_path=None, parts: List[str] = SCORED_PARTS):
    part2ixy = {}
    for part in parts:
        path = os.path.join(data_dir_path, f'{part}.tsv')
        with open(path, 'r', encoding='utf-8') as rf:
            # first line is a header of the corresponding columns
            lines = rf.readlines()[1:]
            col_count = len(lines[0].strip('\n').split('\t'))
            if col_count == 2:
                strings, transliterations = zip(
                    *list(map(lambda l: l.strip('\n').split('\t'), lines))
                )
            elif col_count == 1:
                strings = list(map(lambda l: l.strip('\n'), lines))
                transliterations = None
            else:
                raise ValueError("wrong amount of columns")
        part2ixy[part] = (
            [f'{part}/{i}' for i in range(len(strings))],
            strings, transliterations,
        )
    return part2ixy


def load_transliterations_only(data_dir_path=None, parts: List[str] = SCORED_PARTS):
    part2iy = {}
    for part in parts:
        path = os.path.join(data_dir_path, f'{part}.tsv')
        with open(path, 'r', encoding='utf-8') as rf:
            # first line is a header of the corresponding columns
            lines = rf.readlines()[1:]
            col_count = len(lines[0].strip('\n').split('\t'))
            n_lines = len(lines)
            if col_count == 2:
                transliterations = [l.strip('\n').split('\t')[1] for l in lines]
            elif col_count == 1:
                transliterations = None
            else:
                raise ValueError("Wrong amount of columns")
        part2iy[part] = (
            [f'{part}/{i}' for i in range(n_lines)],
            transliterations,
        )
    return part2iy


def save_preds(preds, preds_fname):
    """
    Save classifier predictions in format appropriate for scoring.
    """
    with codecs.open(preds_fname, 'w') as outp:
        for idx, preds in preds:
            print(idx, *preds, sep='\t', file=outp)
    print('Predictions saved to %s' % preds_fname)


def load_preds(preds_fname, top_k=1):
    """
    Load classifier predictions in format appropriate for scoring.
    """
    kwargs = {
        "filepath_or_buffer": preds_fname,
        "names": ["id", "pred"],
        "sep": '\t',
    }

    pred_ids = list(read_csv(**kwargs, usecols=["id"])["id"])

    pred_y = {
        pred_id: [y]
        for pred_id, y in zip(
            pred_ids, read_csv(**kwargs, usecols=["pred"])["pred"]
        )
    }

    for y in pred_y.values():
        assert len(y) == top_k

    return pred_ids, pred_y


def compute_hit_k(preds, k=10):
    raise NotImplementedError


def compute_mrr(preds):
    raise NotImplementedError


def compute_acc_1(preds, true):
    right_answers = 0
    bonus = 0
    for pred, y in zip(preds, true):
        if pred[0] == y:
            right_answers += 1
        elif pred[0] != pred[0] and y == 'нань':
            print('Your test file contained empty string, skipping %f and %s' % (pred[0], y))
            bonus += 1 # bugfix: skip empty line in test
    return right_answers / (len(preds) - bonus)


def score(preds, true):
    assert len(preds) == len(true), 'inconsistent amount of predictions and ground truth answers'
    acc_1 = compute_acc_1(preds, true)
    return {'acc@1': acc_1}


def score_preds(preds_path, data_dir, parts=SCORED_PARTS):
    part2iy = load_transliterations_only(data_dir, parts=parts)
    pred_ids, pred_dict = load_preds(preds_path)
    # pred_dict = {i:y for i,y in zip(pred_ids, pred_y)}
    scores = {}
    for part, (true_ids, true_y) in part2iy.items():
        if true_y is None:
            print('no labels for %s set' % part)
            continue
        pred_y = [pred_dict[i] for i in true_ids]
        score_values = score(pred_y, true_y)
        acc_1 = score_values['acc@1']
        print('%s set accuracy@1: %.2f' % (part, acc_1))
        scores[part] = score_values 
    return scores

### Train and predict results

In [None]:
from time import time
import numpy as np
import os


def train_and_predict(translit_path, scored_parts):
    top_k = 1
    part2ixy = load_dataset(translit_path, parts=scored_parts)
    train_ids, train_strings, train_transliterations = part2ixy['train']
    print('\nTraining classifier on %d examples from train set ...' % len(train_strings))
    st = time()
    params = baseline_train(train_strings, train_transliterations)
    print('Classifier trained in %.2fs' % (time() - st))

    allpreds = []
    for part, (ids, x, y) in part2ixy.items():
        print('\nClassifying %s set with %d examples ...' % (part, len(x)))
        st = time()
        preds = baseline_classify(x, params)
        print('%s set classified in %.2fs' % (part, time() - st))
        count_of_values = list(map(len, preds))
        assert np.all(np.array(count_of_values) == top_k)
        #score(preds, y)
        allpreds.extend(zip(ids, preds))

    save_preds(allpreds, preds_fname=PREDS_FNAME)
    print('\nChecking saved predictions ...')
    return score_preds(preds_path=PREDS_FNAME, data_dir=translit_path, parts=scored_parts)

In [None]:
train_and_predict(TRANSLIT_PATH, SCORED_PARTS)


Training classifier on 105371 examples from train set ...
Classifier trained in 3.56s

Classifying train set with 105371 examples ...
train set classified in 22.14s

Classifying dev set with 26342 examples ...
dev set classified in 4.96s

Classifying train_small set with 2000 examples ...
train_small set classified in 0.36s

Classifying dev_small set with 2000 examples ...
dev_small set classified in 0.38s

Classifying test set with 32926 examples ...
test set classified in 7.49s
Predictions saved to preds_translit_baseline.tsv

Checking saved predictions ...
train set accuracy@1: 0.33
dev set accuracy@1: 0.31
train_small set accuracy@1: 0.34
dev_small set accuracy@1: 0.32
no labels for test set


{'train': {'acc@1': 0.32907536229133255},
 'dev': {'acc@1': 0.3112899552046162},
 'train_small': {'acc@1': 0.3365},
 'dev_small': {'acc@1': 0.323}}

## Transformer-based approach


To implement your algorithm, use the template code, which needs to be modified.

First, you need to add some details in the code of the Transformer architecture, implement the methods of the class `LrScheduler`, which is responsible for updating the learning rate during training.
Next, you need to select the hyperparameters for the model according to the proposed guide.

In [None]:
!pip install Levenshtein

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting Levenshtein
  Downloading Levenshtein-0.20.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (175 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m175.5/175.5 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting rapidfuzz<3.0.0,>=2.3.0
  Downloading rapidfuzz-2.15.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rapidfuzz, Levenshtein
Successfully installed Levenshtein-0.20.9 rapidfuzz-2.15.1


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

import pandas as pd
import numpy as np
import itertools as it
import collections as col
import random
import os
import copy
import json
from tqdm import tqdm
import datetime, time
import math
import copy
import os
import pandas as pd
import torch
import torch.nn as nn
import torch.utils.data as torch_data
import itertools as it
import collections as col
import random

import Levenshtein as le

### Load dataset and embeddings

In [None]:
def load_datasets(data_dir_path, parts):
    datasets = {}
    for part in parts:
        path = os.path.join(data_dir_path, f'{part}.tsv')
        datasets[part] = pd.read_csv(path, sep='\t', na_filter=False)
        print(f'Loaded {part} dataset, length: {len(datasets[part])}')
    return datasets

In [None]:
class TextEncoder:
    def __init__(self, load_dir_path=None):
        self.lang_keys = ['en', 'ru']
        self.directions = ['id2token', 'token2id']
        self.service_token_names = {
            'pad_token': '<pad>',
            'start_token': '<start>',
            'unk_token': '<unk>',
            'end_token': '<end>'
        }
        service_id2token = dict(enumerate(self.service_token_names.values()))
        service_token2id ={v:k for k,v in service_id2token.items()}
        self.service_vocabs = dict(zip(self.directions,
                                       [service_id2token, service_token2id]))
        if load_dir_path is None:
            self.vocabs = {}
            for lk in self.lang_keys:
                self.vocabs[lk] = copy.deepcopy(self.service_vocabs)
        else:
            self.vocabs = self.load_vocabs(load_dir_path)
    def load_vocabs(self, load_dir_path):
        vocabs = {}
        load_path = os.path.join(load_dir_path, 'vocabs')
        for lk in self.lang_keys:
            vocabs[lk] = {}
            for d in self.directions:
                columns = d.split('2')
                print(lk, d)
                df = pd.read_csv(os.path.join(load_path, f'{lk}_{d}'))
                vocabs[lk][d] = dict(zip(*[df[c] for c in columns]))
        return vocabs
    
    def save_vocabs(self, save_dir_path):
        save_path = os.path.join(save_dir_path, 'vocabs')
        os.makedirs(save_path, exist_ok=True)
        for lk in self.lang_keys:
            for d in self.directions:
                columns = d.split('2')
                pd.DataFrame(data=self.vocabs[lk][d].items(),
                    columns=columns).to_csv(os.path.join(save_path, f'{lk}_{d}'),
                                                index=False,
                                                sep=',')
    def make_vocabs(self, data_df):
        for lk in self.lang_keys:
            tokens = col.Counter(''.join(list(it.chain(*data_df[lk])))).keys()
            part_id2t = dict(enumerate(tokens, start=len(self.service_token_names)))
            part_t2id = {k:v for v,k in part_id2t.items()}
            part_vocabs = [part_id2t, part_t2id]
            for i in range(len(self.directions)):
                self.vocabs[lk][self.directions[i]].update(part_vocabs[i])
                
        self.src_vocab_size = len(self.vocabs['en']['id2token'])
        self.tgt_vocab_size = len(self.vocabs['ru']['id2token'])
                
    def frame(self, sample, start_token=None, end_token=None):
        if start_token is None:
            start_token=self.service_token_names['start_token']
        if end_token is None:
            end_token=self.service_token_names['end_token']
        return [start_token] + sample + [end_token]
    def token2id(self, samples, frame, lang_key):
        if frame:
            samples = list(map(self.frame, samples))
        vocab = self.vocabs[lang_key]['token2id']
        return list(map(lambda s:
                        [vocab[t] if t in vocab.keys() else vocab[self.service_token_names['unk_token']]
                         for t in s], samples))
    
    def unframe(self, sample, start_token=None, end_token=None):
        if start_token is None:
            start_token=self.service_vocabs['token2id'][self.service_token_names['start_token']]
        if end_token is None:
            end_token=self.service_vocabs['token2id'][self.service_token_names['end_token']]
        pad_token=self.service_vocabs['token2id'][self.service_token_names['pad_token']]
        return list(it.takewhile(lambda e: e != end_token and e != pad_token, sample[1:]))
    def id2token(self, samples, unframe, lang_key):
        if unframe:
            samples = list(map(self.unframe, samples))
        vocab = self.vocabs[lang_key]['id2token']
        return list(map(lambda s:
                        [vocab[idx] if idx in vocab.keys() else self.service_token_names['unk_token'] for idx in s], samples))


class TranslitData(torch_data.Dataset):
    def __init__(self, source_strings, target_strings,
                text_encoder):
        super(TranslitData, self).__init__()
        self.source_strings = source_strings
        self.text_encoder = text_encoder
        if target_strings is not None:
            assert len(source_strings) == len(target_strings)
            self.target_strings = target_strings
        else:
            self.target_strings = None
    def __len__(self):
        return len(self.source_strings)
    def __getitem__(self, idx):
        src_str = self.source_strings[idx]
        encoder_input = self.text_encoder.token2id([list(src_str)], frame=True, lang_key='en')[0]
        if self.target_strings is not None:
            tgt_str = self.target_strings[idx]
            tmp = self.text_encoder.token2id([list(tgt_str)], frame=True, lang_key='ru')[0]
            decoder_input = tmp[:-1]
            decoder_target = tmp[1:]
            return (encoder_input, decoder_input, decoder_target)
        else:
            return (encoder_input,)


class BatchSampler(torch_data.BatchSampler):
    def __init__(self, sampler, batch_size, drop_last, shuffle_each_epoch):
        super(BatchSampler, self).__init__(sampler, batch_size, drop_last)
        self.batches = []
        for b in super(BatchSampler, self).__iter__():
            self.batches.append(b)
        self.shuffle_each_epoch = shuffle_each_epoch
        if self.shuffle_each_epoch:
            random.shuffle(self.batches)
        self.index = 0
        #print(f'Batches collected: {len(self.batches)}')
    def __iter__(self):
        self.index = 0
        return self
    def __next__(self):
        if self.index == len(self.batches):
            if self.shuffle_each_epoch:
                random.shuffle(self.batches)
            raise StopIteration
        else:
            batch = self.batches[self.index]
            self.index += 1
            return batch

def collate_fn(batch_list):
    '''batch_list can store either 3 components:
        encoder_inputs, decoder_inputs, decoder_targets
        or single component: encoder_inputs'''
    components = list(zip(*batch_list))
    batch_tensors = []
    for data in components:
        max_len = max([len(sample) for sample in data])
        #print(f'Maximum length in batch = {max_len}')
        sample_tensors = [torch.tensor(s, requires_grad=False, dtype=torch.int64)
                         for s in data]
        batch_tensors.append(nn.utils.rnn.pad_sequence(
            sample_tensors,
            batch_first=True, padding_value=0))
    return tuple(batch_tensors) 


def create_dataloader(source_strings, target_strings,
                      text_encoder, batch_size,
                      shuffle_batches_each_epoch):
    '''target_strings parameter can be None'''
    dataset = TranslitData(source_strings, target_strings,
                                text_encoder=text_encoder)
    seq_sampler = torch_data.SequentialSampler(dataset)
    batch_sampler = BatchSampler(seq_sampler, batch_size=batch_size,
                                drop_last=False,
                                shuffle_each_epoch=shuffle_batches_each_epoch)
    dataloader = torch_data.DataLoader(dataset,
                                       batch_sampler=batch_sampler,
                                       collate_fn=collate_fn)
    return dataloader

### Metric function

In [None]:
def compute_metrics(predicted_strings, target_strings, metrics):
    metric_values = {}
    for m in metrics:
        if m == 'acc@1':
            metric_values[m] = sum(predicted_strings == target_strings) / len(target_strings)
        elif m =='mean_ld@1':
            metric_values[m] =\
                np.mean(list(map(lambda e: le.distance(*e), zip(predicted_strings, target_strings))))
        else: 
            raise ValueError(f'Unknown metric: {m}')
    return metric_values

###  Positional Encoding

As you remember, Transformer treats an input sequence of elements as a time series. Since the Encoder inside the Transformer simultaneously processes the entire input sequence, the information about the position of the element needs to be encoded inside its embedding, since it is not identified in any other way inside the model. That is why the PositionalEncoding layer is used, which sums embeddings with a vector of the same dimension.
Let the matrix of these vectors for each position of the time series be denoted as $PE$. Then the elements of the matrix are:

$$ PE_{(pos,2i)} = \sin{(pos/10000^{2i/d_{model}})}$$
$$ PE_{(pos,2i+1)} = \cos{(pos/10000^{2i/d_{model}})}$$

where $pos$ - is the position, $i$ - index of the component of the corresponging vector, $d_{model}$ - dimension of each vector. Thus, even components represent sine values, and odd ones represent cosine values with different arguments.

In this task you are required to implement these formulas inside the class constructor *PositionalEncoding* in the main file ``translit.py``, which you are to upload. To run the test use the following function:

`test_positional_encoding()`

Make sure that there is no any `AssertionError`!


In [None]:
class Embedding(nn.Module):
    def __init__(self, hidden_size, vocab_size):
        super(Embedding, self).__init__()
        self.emb_layer = nn.Embedding(vocab_size, hidden_size)
        self.hidden_size = hidden_size

    def forward(self, x):
        return self.emb_layer(x)

class PositionalEncoding(nn.Module):
    def __init__(self, hidden_size, max_len=512):
        super(PositionalEncoding, self).__init__()
        self.hidden_size = hidden_size
        self.max_len = max_len
        pe = torch.zeros(max_len, hidden_size, requires_grad=False)
        # TODO: implement your code here 
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, hidden_size, 2).float() * (-math.log(10000.0) / hidden_size))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0)
        # pe shape: (1, max_len, hidden_size)
        self.register_buffer('pe', pe)

    def forward(self, x):
        # x: shape (batch size, sequence length, hidden size)
        x = x + self.pe[:, :x.size(1)]
        return x

In [None]:
def test_positional_encoding():
    pe = PositionalEncoding(max_len=3, hidden_size=4)
    res_1 = torch.tensor([[[ 0.0000,  1.0000,  0.0000,  1.0000],
                           [ 0.8415,  0.5403,  0.0100,  0.9999],
                           [ 0.9093, -0.4161,  0.0200,  0.9998]]])
    # print(pe.pe - res_1)
    assert torch.all(torch.abs(pe.pe - res_1) < 1e-4).item()
    print('Test is passed!')

In [None]:
test_positional_encoding()

Test is passed!


### LayerNorm

In [None]:
class LayerNorm(nn.Module):
    "Layer Normalization layer"

    def __init__(self, hidden_size, eps=1e-6):
        super(LayerNorm, self).__init__()
        self.gain = nn.Parameter(torch.ones(hidden_size))
        self.bias = nn.Parameter(torch.zeros(hidden_size))
        self.eps = eps

    def forward(self, x):
        mean = x.mean(-1, keepdim=True)
        std = x.std(-1, keepdim=True)
        return self.gain * (x - mean) / (std + self.eps) + self.bias

### SublayerConnection

In [None]:
class SublayerConnection(nn.Module):
    """
    A residual connection followed by a layer normalization.
    """

    def __init__(self, hidden_size, dropout):
        super(SublayerConnection, self).__init__()
        self.layer_norm = LayerNorm(hidden_size)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, sublayer):
        return self.layer_norm(x + self.dropout(sublayer(x)))

def padding_mask(x, pad_idx=0):
    assert len(x.size()) >= 2
    return (x != pad_idx).unsqueeze(-2)

def look_ahead_mask(size):
    "Mask out the right context"
    attn_shape = (1, size, size)
    look_ahead_mask = np.triu(np.ones(attn_shape), k=1).astype('uint8')
    return torch.from_numpy(look_ahead_mask) == 0

def compositional_mask(x, pad_idx=0):
    pm = padding_mask(x, pad_idx=pad_idx)
    seq_length = x.size(-1)
    result_mask = pm & \
                  look_ahead_mask(seq_length).type_as(pm.data)
    return result_mask

### FeedForward

In [None]:
class FeedForward(nn.Module):
    def __init__(self, hidden_size, ff_hidden_size, dropout=0.1):
        super(FeedForward, self).__init__()
        self.pre_linear = nn.Linear(hidden_size, ff_hidden_size)
        self.post_linear = nn.Linear(ff_hidden_size, hidden_size)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        return self.post_linear(self.dropout(F.relu(self.pre_linear(x))))

def clone_layer(module, N):
    "Produce N identical layers."
    return nn.ModuleList([copy.deepcopy(module) for _ in range(N)])

###  MultiHeadAttention


Then you are required to implement `attention` method in the class  `MultiHeadAttention`. The MultiHeadAttention layer takes as input  query vectors, key and value vectors for each step of the sequence of matrices  Q,K,V correspondingly. Each key vector, value vector, and query vector is obtained as a result of linear projection using one of three trained vector parameter matrices from the previous layer. This semantics can be represented in the form of formulas:
$$
Attention(Q, K, V)=softmax\left(\frac{Q K^{T}}{\sqrt{d_{k}}}\right) V\\
$$

$$
MultiHead(Q, K, V) = Concat\left(head_1, ... , head_h\right) W^O\\
$$

$$
head_i=Attention\left(Q W_i^Q, K W_i^K, V W_i^V\right)\\
$$
$h$ - the number of attention heads - parallel sub-layers for Scaled Dot-Product Attention on a vector of smaller dimension ($d_{k} = d_{q} = d_{v} = d_{model} / h$). 
The logic of  \texttt{MultiHeadAttention} is presented in the picture (from original  [paper](https://arxiv.org/abs/1706.03762)):

![](https://lilianweng.github.io/lil-log/assets/images/transformer.png)


Inside a method `attention` you are required to create a dropout layer from  MultiHeadAttention class constructor. Dropout layer is to be applied directly on the attention weights - the result of softmax operation. Value of drop probability  can be regulated in the train in the `model_config['dropout']['attention']`.

The correctness of implementation can be checked with
`test_multi_head_attention()`



In [None]:
class MultiHeadAttention(nn.Module):
    def __init__(self, n_heads, hidden_size, dropout=None):
        super(MultiHeadAttention, self).__init__()
        assert hidden_size % n_heads == 0
        self.head_hidden_size = hidden_size // n_heads
        self.n_heads = n_heads
        self.linears = clone_layer(nn.Linear(hidden_size, hidden_size), 4)
        self.attn_weights = None
        self.dropout = dropout
        if self.dropout is not None:
            self.dropout_layer = nn.Dropout(p=self.dropout)

    def attention(self, query, key, value, mask):
        """Compute 'Scaled Dot Product Attention'
            query, key and value tensors have the same shape:
                (batch size, number of heads, sequence length, head hidden size)
            mask shape: (batch size, 1, sequence length, sequence length)
                '1' dimension value will be broadcasted to number of heads inside your operations
            mask should be applied before using softmax to get attn_weights
        """
        ## attn_weights shape: (batch size, number of heads, sequence length, sequence length)
        ## output shape: (batch size, number of heads, sequence length, head hidden size)
        ## TODO: provide your implementation here
        ## don't forget to apply dropout to attn_weights if self.dropout is not None
        d_k = query.size(-1)
        scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(d_k)
        if mask is not None:
          scores = scores.masked_fill(mask == 0, -1e9)
        attn_weights = F.softmax(scores, dim = -1)
        # if self.dropout is not None:
        #   attn_weights = self.dropout(attn_weights)
        if self.dropout is not None:
          self.dropout_layers = nn.ModuleList([nn.Dropout(p=self.dropout) for _ in range(4)])
          attn_weights = self.dropout_layers[0](attn_weights)

        output = torch.matmul(attn_weights, value)
        #raise NotImplementedError
        return output, attn_weights

    def forward(self, query, key, value, mask=None):
        if mask is not None:
            # Same mask applied to all h heads.
            mask = mask.unsqueeze(1)
        batch_size = query.size(0)

        # Split vectors for different attention heads (from hidden_size => n_heads x head_hidden_size)
        # and do separate linear projection, for separate trainable weights
        query, key, value = \
            [l(x).view(batch_size, -1, self.n_heads, self.head_hidden_size).transpose(1, 2)
             for l, x in zip(self.linears, (query, key, value))]

        x, self.attn_weights = self.attention(query, key, value, mask=mask)
        # x shape: (batch size, number of heads, sequence length, head hidden size)
        # self.attn_weights shape: (batch size, number of heads, sequence length, sequence length)

        # Concatenate the output of each head
        x = x.transpose(1, 2).contiguous() \
            .view(batch_size, -1, self.n_heads * self.head_hidden_size)

        return self.linears[-1](x)

In [None]:
def test_multi_head_attention():
    mha = MultiHeadAttention(n_heads=1, hidden_size=5, dropout=None)
    # batch_size == 2, sequence length == 3, hidden_size == 5
    # query = torch.arange(150).reshape(2, 3, 5)
    query = torch.tensor([[[[ 0.64144618, -0.95817388,  0.37432297,  0.58427106,
          -0.94668716]],
        [[-0.23199289,  0.66329209, -0.46507035, -0.54272512,
          -0.98640698]],
        [[ 0.07546638, -0.09277002,  0.20107185, -0.97407381,
          -0.27713414]]],
       [[[ 0.14727783,  0.4747886 ,  0.44992016, -0.2841419 ,
          -0.81820319]],
        [[-0.72324994,  0.80643179, -0.47655449,  0.45627872,
           0.60942404]],
        [[ 0.61712569, -0.62947282, -0.95215713, -0.38721959,
          -0.73289725]]]])
    key = torch.tensor([[[[-0.81759856, -0.60049991, -0.05923424,  0.51898901,
          -0.3366209 ]],
        [[ 0.83957818, -0.96361722,  0.62285191,  0.93452467,
           0.51219613]],
        [[-0.72758847,  0.41256154,  0.00490795,  0.59892503,
          -0.07202049]]],
       [[[ 0.72315339, -0.49896314,  0.94254637, -0.54356006,
          -0.04837949]],
        [[ 0.51759322, -0.43927061, -0.59924184,  0.92241702,
          -0.86811696]],
        [[-0.54322046, -0.92323003, -0.827746  ,  0.90842783,
           0.88428119]]]])
    value = torch.tensor([[[[-0.83895431,  0.805027  ,  0.22298283, -0.84849915,
          -0.34906026]],
        [[-0.02899652, -0.17456128, -0.17535998, -0.73160314,
          -0.13468061]],
        [[ 0.75234265,  0.02675947,  0.84766286, -0.5475651 ,
          -0.83319316]]],
       [[[-0.47834413,  0.34464645, -0.41921457,  0.33867964,
           0.43470836]],
        [[-0.99000979,  0.10220893, -0.4932273 ,  0.95938905,
           0.01927012]],
        [[ 0.91607137,  0.57395644, -0.90914179,  0.97212912,
           0.33078759]]]])
    query = query.float().transpose(1,2)
    key = key.float().transpose(1,2)
    value = value.float().transpose(1,2)

    x,_ = torch.max(query[:,0,:,:], axis=-1)
    mask = compositional_mask(x)
    mask.unsqueeze_(1)
    for n,t in [('query', query), ('key', key), ('value', value), ('mask', mask)]:
        print(f'Name: {n}, shape: {t.size()}')
    with torch.no_grad():
        output, attn_weights = mha.attention(query, key, value, mask=mask)
    assert output.size() == torch.Size([2,1,3,5])
    assert attn_weights.size() == torch.Size([2,1,3,3])

    truth_output = torch.tensor([[[[-0.8390,  0.8050,  0.2230, -0.8485, -0.3491],
          [-0.6043,  0.5212,  0.1076, -0.8146, -0.2870],
          [-0.0665,  0.2461,  0.3038, -0.7137, -0.4410]]],
        [[[-0.4783,  0.3446, -0.4192,  0.3387,  0.4347],
          [-0.7959,  0.1942, -0.4652,  0.7239,  0.1769],
          [-0.3678,  0.2868, -0.5799,  0.7987,  0.2086]]]])
    truth_attn_weights = torch.tensor([[[[1.0000, 0.0000, 0.0000],
          [0.7103, 0.2897, 0.0000],
          [0.3621, 0.3105, 0.3274]]],
        [[[1.0000, 0.0000, 0.0000],
          [0.3793, 0.6207, 0.0000],
          [0.2642, 0.4803, 0.2555]]]])
    # print(torch.abs(output - truth_output))
    # print(torch.abs(attn_weights - truth_attn_weights))
    assert torch.all(torch.abs(output - truth_output) < 1e-4).item()
    assert torch.all(torch.abs(attn_weights - truth_attn_weights) < 1e-4).item()
    print('Test is passed!')

In [None]:
test_multi_head_attention()

Name: query, shape: torch.Size([2, 1, 3, 5])
Name: key, shape: torch.Size([2, 1, 3, 5])
Name: value, shape: torch.Size([2, 1, 3, 5])
Name: mask, shape: torch.Size([2, 1, 3, 3])
Test is passed!


### Encoder

In [None]:
class EncoderLayer(nn.Module):
    "Encoder is made up of self-attn and feed forward (defined below)"

    def __init__(self, hidden_size, ff_hidden_size, n_heads, dropout):
        super(EncoderLayer, self).__init__()
        self.self_attn = MultiHeadAttention(n_heads, hidden_size,
                                            dropout=dropout['attention'])
        self.feed_forward = FeedForward(hidden_size, ff_hidden_size,
                                        dropout=dropout['relu'])
        self.sublayers = clone_layer(SublayerConnection(hidden_size, dropout['residual']), 2)

    def forward(self, x, mask):
        x = self.sublayers[0](x, lambda x: self.self_attn(x, x, x, mask))
        return self.sublayers[1](x, self.feed_forward)

class Encoder(nn.Module):
    def __init__(self, config):
        super(Encoder, self).__init__()
        self.embedder = Embedding(config['hidden_size'],
                                  config['src_vocab_size'])
        self.positional_encoder = PositionalEncoding(config['hidden_size'],
                                                     max_len=config['max_src_seq_length'])
        self.embedding_dropout = nn.Dropout(p=config['dropout']['embedding'])
        self.encoder_layer = EncoderLayer(config['hidden_size'],
                                          config['ff_hidden_size'],
                                          config['n_heads'],
                                          config['dropout'])
        self.layers = clone_layer(self.encoder_layer, config['n_layers'])
        self.layer_norm = LayerNorm(config['hidden_size'])

    def forward(self, x, mask):
        "Pass the input (and mask) through each layer in turn."
        x = self.embedding_dropout(self.positional_encoder(self.embedder(x)))
        for layer in self.layers:
            x = layer(x, mask)
        return self.layer_norm(x)

### Decoder

In [None]:
class DecoderLayer(nn.Module):
    """
    Decoder is made of 3 sublayers: self attention, encoder-decoder attention
    and feed forward"
    """

    def __init__(self, hidden_size, ff_hidden_size, n_heads, dropout):
        super(DecoderLayer, self).__init__()

        self.self_attn = MultiHeadAttention(n_heads, hidden_size,
                                            dropout=dropout['attention'])
        self.encdec_attn = MultiHeadAttention(n_heads, hidden_size,
                                              dropout=dropout['attention'])
        self.feed_forward = FeedForward(hidden_size, ff_hidden_size,
                                        dropout=dropout['relu'])
        self.sublayers = clone_layer(SublayerConnection(hidden_size, dropout['residual']), 3)

    def forward(self, x, encoder_output, encoder_mask, decoder_mask):
        x = self.sublayers[0](x, lambda x: self.self_attn(x, x, x, decoder_mask))
        x = self.sublayers[1](x, lambda x: self.encdec_attn(x, encoder_output,
                                                            encoder_output, encoder_mask))
        return self.sublayers[2](x, self.feed_forward)

class Decoder(nn.Module):
    def __init__(self, config):
        super(Decoder, self).__init__()
        self.embedder = Embedding(config['hidden_size'],
                                  config['tgt_vocab_size'])
        self.positional_encoder = PositionalEncoding(config['hidden_size'],
                                                     max_len=config['max_tgt_seq_length'])
        self.embedding_dropout = nn.Dropout(p=config['dropout']['embedding'])
        self.decoder_layer = DecoderLayer(config['hidden_size'],
                                          config['ff_hidden_size'],
                                          config['n_heads'],
                                          config['dropout'])
        self.layers = clone_layer(self.decoder_layer, config['n_layers'])
        self.layer_norm = LayerNorm(config['hidden_size'])

    def forward(self, x, encoder_output, encoder_mask, decoder_mask):
        x = self.embedding_dropout(self.positional_encoder(self.embedder(x)))
        for layer in self.layers:
            x = layer(x, encoder_output, encoder_mask, decoder_mask)
        return self.layer_norm(x)

### Transformer

In [None]:
class Transformer(nn.Module):
    def __init__(self, config):
        super(Transformer, self).__init__()
        self.config = config
        self.encoder = Encoder(config)
        self.decoder = Decoder(config)
        self.proj = nn.Linear(config['hidden_size'], config['tgt_vocab_size'])

        self.pad_idx = config['pad_idx']
        self.tgt_vocab_size = config['tgt_vocab_size']

    def encode(self, encoder_input, encoder_input_mask):
        return self.encoder(encoder_input, encoder_input_mask)

    def decode(self, encoder_output, encoder_input_mask, decoder_input, decoder_input_mask):
        return self.decoder(decoder_input, encoder_output, encoder_input_mask, decoder_input_mask)

    def linear_project(self, x):
        return self.proj(x)

    def forward(self, encoder_input, decoder_input):
        encoder_input_mask = padding_mask(encoder_input, pad_idx=self.config['pad_idx'])
        decoder_input_mask = compositional_mask(decoder_input, pad_idx=self.config['pad_idx'])
        encoder_output = self.encode(encoder_input, encoder_input_mask)
        decoder_output = self.decode(encoder_output, encoder_input_mask,
                                     decoder_input, decoder_input_mask)
        output_logits = self.linear_project(decoder_output)
        return output_logits


def prepare_model(config):
    model = Transformer(config)

    for p in model.parameters():
        if p.dim() > 1:
            nn.init.xavier_uniform_(p)
    return model

####  LrScheduler

The last thing you have to prepare is the class  `LrScheduler`, which is in charge of  learning rate updating after every step of the optimizer. You are required to fill the class constructor and the method `learning_rate`. The preferable stratagy of updating the learning rate (lr), is the following two stages:

* "warmup" stage - lr linearly increases until the defined value during the fixed number of steps (the proportion of all training steps - the parameter `train_config['warmup\_steps\_part']` in the train function). 
* "decrease" stage - lr linearly decreases until 0 during the left training steps.

`learning_rate()` call should return the value of  lr at this step,  which number is stored at self.step. The class constructor takes not only `warmup_steps_part` but the peak learning rate value `lr_peak` at the end of "warmup" stage and a string name of the strategy of learning rate scheduling. You can test other strategies if you want to with `self.type attribute`. 

Correctness check: `test_lr_scheduler()`


In [None]:
class LrScheduler:
    def __init__(self, n_steps, **kwargs):
        self.type = kwargs['type']
        if self.type == 'warmup,decay_linear':
            ## TODO: provide your implementation here
            self.n_steps=n_steps
            self.lr_peak=kwargs['lr_peak']
            self.warmup_steps_part=kwargs['warmup_steps_part']
            #raise NotImplementedError
        else:
            raise ValueError(f'Unknown type argument: {self.type}')
        self._step = 0
        self._lr = 0

    def step(self, optimizer):
        self._step += 1
        lr = self.learning_rate()
        for p in optimizer.param_groups:
            p['lr'] = lr

    def learning_rate(self, step=None):
        if step is None:
            step = self._step
        if self.type == 'warmup,decay_linear':
            ## TODO: provide your implementation here
            if step <= np.floor((self.n_steps*self.warmup_steps_part)):
                self._lr = self.lr_peak * step / np.floor((self.n_steps*self.warmup_steps_part))
            else:
                self._lr = self.lr_peak * (self.n_steps - step) / (self.n_steps - np.floor(self.n_steps*self.warmup_steps_part))
        return self._lr

    def state_dict(self):
        sd = copy.deepcopy(self.__dict__)
        return sd

    def load_state_dict(self, sd):
        for k in sd.keys():
            self.__setattr__(k, sd[k])

In [None]:
def test_lr_scheduler():
    lrs_type = 'warmup,decay_linear'
    warmup_steps_part =  0.1
    lr_peak = 3e-4
    sch = LrScheduler(100, type=lrs_type, warmup_steps_part=warmup_steps_part,
                      lr_peak=lr_peak)
    assert sch.learning_rate(step=5) - 15e-5 < 1e-6
    assert sch.learning_rate(step=10) - 3e-4 < 1e-6
    assert sch.learning_rate(step=50) - 166e-6 < 1e-6
    assert sch.learning_rate(step=100) - 0. < 1e-6
    print('Test is passed!')

In [None]:
test_lr_scheduler()

Test is passed!


### Run and translate

In [None]:
from torch.nn.modules import dropout
def format_time(elapsed):
    '''
    Takes a time in seconds and returns a string hh:mm:ss
    '''
    elapsed_rounded = int(round((elapsed)))
    return str(datetime.timedelta(seconds=elapsed_rounded))


def run_epoch(data_iter, model, lr_scheduler, optimizer, smooth, device, verbose=False):
    start = time.time()
    local_start = start
    total_tokens = 0
    total_loss = 0
    tokens = 0
    if smooth == 0:
      loss_fn = nn.CrossEntropyLoss(reduction='sum',label_smoothing=0.15)
    else:
      loss_fn = SmoothCrossEntropyLoss(smoothing=smooth, reduction='sum')

    for i, batch in tqdm(enumerate(data_iter)):
        encoder_input = batch[0].to(device)
        decoder_input = batch[1].to(device)
        decoder_target = batch[2].to(device)
        logits = model(encoder_input, decoder_input)
        loss = loss_fn(logits.view(-1, model.tgt_vocab_size),
                       decoder_target.view(-1))
        total_loss += loss.item()
        batch_n_tokens = (decoder_target != model.pad_idx).sum().item()
        total_tokens += batch_n_tokens
        if optimizer is not None:
            optimizer.zero_grad()
            lr_scheduler.step(optimizer)
            loss.backward()
            optimizer.step()

        tokens += batch_n_tokens
        if verbose and i % 1000 == 1:
            elapsed = time.time() - local_start
            print("batch number: %d, accumulated average loss: %f, tokens per second: %f" %
                  (i, total_loss / total_tokens, tokens / elapsed))
            local_start = time.time()
            tokens = 0

    average_loss = total_loss / total_tokens
    print('** End of epoch, accumulated average loss = %f **' % average_loss)
    epoch_elapsed_time = format_time(time.time() - start)
    print(f'** Elapsed time: {epoch_elapsed_time}**')
    return average_loss


def save_checkpoint(epoch, model, lr_scheduler, optimizer, model_dir_path):
    save_path = os.path.join(model_dir_path, f'cpkt_{epoch}_epoch')
    torch.save({
        'epoch': epoch,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'lr_scheduler_state_dict': lr_scheduler.state_dict()
    }, save_path)
    print(f'Saved checkpoint to {save_path}')

def load_model(epoch, model_dir_path):
    save_path = os.path.join(model_dir_path, f'cpkt_{epoch}_epoch')
    checkpoint = torch.load(save_path)
    with open(os.path.join(model_dir_path, 'model_config.json'), 'r', encoding='utf-8') as rf:
        model_config = json.load(rf)
    model = prepare_model(model_config)
    model.load_state_dict(checkpoint['model_state_dict'])
    return model

def greedy_decode(model, device, encoder_input, max_len, start_symbol):
    batch_size = encoder_input.size()[0]
    decoder_input = torch.ones(batch_size, 1).fill_(start_symbol).type_as(encoder_input.data).to(device)

    for i in range(max_len):
        logits = model(encoder_input, decoder_input)

        _, predicted_ids = torch.max(logits, dim=-1)
        next_word = predicted_ids[:, i]
        # print(next_word)
        rest = torch.ones(batch_size, 1).type_as(decoder_input.data)
        # print(rest[:,0].size(), next_word.size())
        rest[:, 0] = next_word
        decoder_input = torch.cat([decoder_input, rest], dim=1).to(device)
        # print(decoder_input)
    return decoder_input

def generate_predictions(dataloader, max_decoding_len, text_encoder, model, device):
    # print(f'Max decoding length = {max_decoding_len}')
    model.eval()
    predictions = []
    start_token_id = text_encoder.service_vocabs['token2id'][
        text_encoder.service_token_names['start_token']]
    with torch.no_grad():
        for batch in tqdm(dataloader):
            encoder_input = batch[0].to(device)
            prediction_tensor = \
                greedy_decode(model, device, encoder_input, max_decoding_len,
                              start_token_id)

            predictions.extend([''.join(e) for e in text_encoder.id2token(prediction_tensor.cpu().numpy(),
                                                                          unframe=True, lang_key='ru')])
    return np.array(predictions)


def train(source_strings, target_strings, n_epochs, smooth, train_config = None, dropout = None):
    '''Common training cycle for final run (fixed hyperparameters,
    no evaluation during training)'''
    if torch.cuda.is_available():
        device = torch.device('cuda')
        print(f'Using GPU device: {device}')
    else:
        device = torch.device('cpu')
        print(f'GPU is not available, using CPU device {device}')

    train_df = pd.DataFrame({'en': source_strings, 'ru': target_strings})
    text_encoder = TextEncoder()
    text_encoder.make_vocabs(train_df)

    if dropout is None:
      model_config = {
        'src_vocab_size': text_encoder.src_vocab_size,
        'tgt_vocab_size': text_encoder.tgt_vocab_size,
        'max_src_seq_length': max(train_df['en'].aggregate(len)) + 2, #including start_token and end_token
        'max_tgt_seq_length': max(train_df['ru'].aggregate(len)) + 2,
        'n_layers': 2,
        'n_heads': 2,
        'hidden_size': 128,
        'ff_hidden_size': 256,
        'dropout': {
            'embedding': 0.15,
            'attention': 0.1,
            'residual': 0.15,
            'relu': 0.2
        },
        'pad_idx': 0
    }
    else:
      model_config = {
        'src_vocab_size': text_encoder.src_vocab_size,
        'tgt_vocab_size': text_encoder.tgt_vocab_size,
        'max_src_seq_length': max(train_df['en'].aggregate(len)) + 2, #including start_token and end_token
        'max_tgt_seq_length': max(train_df['ru'].aggregate(len)) + 2,
        'n_layers': 2,
        'n_heads': 2,
        'hidden_size': 128,
        'ff_hidden_size': 256,
        'dropout': dropout,
        'pad_idx': 0
    }
     
    model = prepare_model(model_config)
    model.to(device)

    if train_config is None:
      train_config = {'batch_size': 200, 'n_epochs': n_epochs, 'lr_scheduler': {
        'type': 'warmup,decay_linear',
        'warmup_steps_part': 0.1,
        'lr_peak': 5e-4,
    }}

    #Model training procedure
    optimizer = torch.optim.Adam(model.parameters(), lr=0.)
    n_steps = (len(train_df) // train_config['batch_size'] + 1) * train_config['n_epochs']
    lr_scheduler = LrScheduler(n_steps, **train_config['lr_scheduler'])

    # prepare train data
    source_strings, target_strings = zip(*sorted(zip(source_strings, target_strings),
                                                 key=lambda e: len(e[0])))
    train_dataloader = create_dataloader(source_strings, target_strings, text_encoder,
                                         train_config['batch_size'],
                                         shuffle_batches_each_epoch=True)
    # training cycle
    for epoch in range(1,train_config['n_epochs']+1):
        print('\n' + '-'*40)
        print(f'Epoch: {epoch}')
        print(f'Run training...')
        model.train()
        run_epoch(train_dataloader, model,
                  lr_scheduler, optimizer,smooth, device=device, verbose=False)
    learnable_params = {
        'model': model,
        'text_encoder': text_encoder,
    }
    return learnable_params

def classify(source_strings, learnable_params):
    if torch.cuda.is_available():
        device = torch.device('cuda')
        print(f'Using GPU device: {device}')
    else:
        device = torch.device('cpu')
        print(f'GPU is not available, using CPU device {device}')

    model = learnable_params['model']
    text_encoder = learnable_params['text_encoder']
    batch_size = 200
    dataloader = create_dataloader(source_strings, None, text_encoder,
                                   batch_size, shuffle_batches_each_epoch=False)
    max_decoding_len = model.config['max_tgt_seq_length']
    predictions = generate_predictions(dataloader, max_decoding_len, text_encoder, model, device)
    #return single top1 prediction for each sample
    return np.expand_dims(predictions, 1)

## Modifying the structure
To make it easier for the coding to run, I changed the structure of this notebook to the following:
- Label smoothing
- Hyper-parameters choice
- Describe the experiments and results
- Training 


### Label smoothing

We suggest to implement an additional regularization method - **label smoothing**. Now imagine that we have a prediction vector from probabilities at position t in the sequence of tokens for each token id from the vocabulary. CrossEntropy compares it with ground truth one-hot representation

$$[0, ... 0, 1, 0, ..., 0].$$

And now imagine that we are slightly "smoothed" the values in the ground truth vector and obtained

$$[\frac{\alpha}{|V|}, ..., \frac{\alpha}{|V|}, 1(1-\alpha)+\frac{\alpha}{|V|},  \frac{\alpha}{|V|}, ... \frac{\alpha}{|V|}],$$

where $\alpha$ - parameter from 0 to 1, $|V|$ - vocabulary size - number of components in the ground truth vector. The values ​​of this new vector are still summed to 1. Calculate the cross-entropy of our prediction vector and the new ground truth. Now, firstly, cross-entropy will never reach 0, and secondly, the result of the error function will require the model, as usual, to return the highest probability vector compared to other components of the probability vector for the correct token in the dictionary, but at the same time not too large, because as the value of this probability approaches 1, the value of the error function increases. For research on the use of label smoothing, see the [paper](https://arxiv.org/abs/1906.02629).
    
Accordingly, in order to embed label smoothing into the model, it is necessary to carry out the transformation described above on the ground truth vectors, as well as to implement the cross-entropy calculation, since the used `torch.nn.CrossEntropy` class is not quite suitable, since for the ground truth representation of `__call__` method takes the id of the correct token and builds a one-hot vector already inside. However, it is possible to implement what is required based on the internal implementation of this class [CrossEntropyLoss](https://pytorch.org/docs/stable/_modules/torch/nn/modules/loss.html#CrossEntropyLoss).
    

Test different values of $\alpha$ (e.x, 0.05, 0.1, 0.2). Describe your experiments and results.


####SmoothCrossEntropyLoss

In [None]:
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss

class SmoothCrossEntropyLoss(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction

    def k_one_hot(self, targets:torch.Tensor, n_classes:int, smoothing=0.0):
        with torch.no_grad():
            targets = torch.empty(size=(targets.size(0), n_classes),
                                  device=targets.device) \
                                  .fill_(smoothing /(n_classes-1)) \
                                  .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)
        return targets

    def reduce_loss(self, loss):
        return loss.mean() if self.reduction == 'mean' else loss.sum() \
        if self.reduction == 'sum' else loss

    def forward(self, inputs, targets):
        assert 0 <= self.smoothing < 1

        # apply label smoothing to ground truth labels
        targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing)
        log_preds = F.log_softmax(inputs, -1)

        if self.weight is not None:
            log_preds = log_preds * self.weight.unsqueeze(0)

        return self.reduce_loss(-(targets * log_preds).sum(dim=-1))

###  Hyper-parameters choice

The model is ready. Now we need to find the optimal hyper-parameters.

The quality of models with different hyperparameters should be monitored on dev or on dev_small samples (in order to save time, since generating transliterations is a rather time-consuming process, comparable to one training epoch).

To generate predictions, you can use the `generate_predictions` function, to calculate the accuracy@1 metric, and then you can use the `compute_metrics` function.



Hyper-parameters are stored in the dictionary `model_config` and `train_config` in train function. The following hyperparameters in `model_config` and `train_config` are suggested to leave unmodified:

* n_layers $=$ 2
* n_heads $=$ 2
* hidden_size $=$ 128
* fc_hidden_size $=$ 256
* warmup_steps_part $=$ 0.1
* batch_size $=$ 200

 You can vary the dropout value. The model has 4 types of : ***embedding dropout*** applied on embdeddings before sending to the first layer of  Encoder or Decoder, ***attention*** dropout applied on the attention weights in the MultiHeadAttention layer, ***residual dropout*** applied on the output of each sublayer (MultiHeadAttention or FeedForward) in layers Encoder and Decoder and, finaly, ***relu dropout*** in used in FeedForward layer. For all 4 types it is suggested to test the same value of dropout from the list: 0.1, 0.15, 0.2.
 Also it is suggested to test several peak levels of learning rate - **lr_peak** : 5e-4, 1e-3, 2e-3.

Note that if you are using a GPU, then training one epoch takes about 1 minute, and up to 1 GB of video memory is required. When using the CPU, the learning speed slows down by about 2 times. If there are problems with insufficient RAM / video memory, reduce the batch size, but in this case the optimal range of learning rate values will change, and it must be determined again. To train a model with  batch_size $=$ 200 , it will take at least 300 epochs to achieve accuracy 0.66 on dev_small dataset.

In [None]:
PREDS_FNAME = "preds_translit.tsv"
SCORED_PARTS1 = ('train_small', 'dev_small')
TRANSLIT_PATH = "TRANSLIT"

In [None]:
!pip install optuna

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting optuna
  Downloading optuna-3.1.1-py3-none-any.whl (365 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/365.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m358.4/365.7 kB[0m [31m11.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m365.7/365.7 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
Collecting cmaes>=0.9.1
  Downloading cmaes-0.9.1-py3-none-any.whl (21 kB)
Collecting colorlog
  Downloading colorlog-6.7.0-py2.py3-none-any.whl (11 kB)
Collecting alembic>=1.5.0
  Downloading alembic-1.10.3-py3-none-any.whl (212 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.3/212.3 kB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
Collecting Mako
  Downloading Mako-1.2.4-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

#### No Label Smoothing

In [None]:
import optuna
from sklearn.model_selection import train_test_split

# define the hyper-parameter space to search
def objective(trial):
    part2ixy = load_dataset(TRANSLIT_PATH, parts=SCORED_PARTS1)
    train_ids, train_strings, train_transliterations = part2ixy['train_small']
    val_ids, val_strings, val_transliterations = part2ixy['dev_small']
    dropout = {
            'embedding': trial.suggest_categorical('dropout_embedding', [0.1, 0.15, 0.2]),
            'attention': trial.suggest_categorical('dropout_attention', [0.1, 0.15, 0.2]),
            'residual': trial.suggest_categorical('dropout_residual', [0.1, 0.15, 0.2]),
            'relu': trial.suggest_categorical('dropout_relu', [0.1, 0.15, 0.2])
        }
    train_config = {
        'batch_size': 200, 'n_epochs': 1, 
        'lr_scheduler': {
        'type': 'warmup,decay_linear',
        'warmup_steps_part': 0.1,
            'lr_peak': trial.suggest_categorical('lr_peak', [3e-4, 5e-4, 1e-3, 2e-3]),
        },
    }
    
    # train the model with the current hyper-parameters
    learnable_params = train(train_strings, train_transliterations, 1, 0, train_config, dropout)
    for part, (ids, x, y) in part2ixy.items():
    # evaluate the predicted strings using the compute_metrics function
      preds = classify(y, learnable_params)
      metric_values = compute_metrics(np.squeeze(preds), y, ['mean_ld@1'])
    return 1/ metric_values['mean_ld@1'] 

# run the hyper-parameter search with Optuna
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=301)

# print the best hyper-parameter values and the corresponding objective score
print('Best trial:')
trial = study.best_trial
print(f'  Score: {trial.value:.3f}')
print('  Params:')
for key, value in trial.params.items():
    print(f'    {key}: {value}')

[32m[I 2023-04-11 08:46:16,118][0m A new study created in memory with name: no-name-4bf2bf28-b1f9-4526-b7a1-480ec7b3b752[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.69it/s]


** End of epoch, accumulated average loss = 5.384475 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.32it/s]
[32m[I 2023-04-11 08:46:19,711][0m Trial 0 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 0 with value: 0.1505343971097396.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.54it/s]


** End of epoch, accumulated average loss = 4.975340 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]
[32m[I 2023-04-11 08:46:23,065][0m Trial 1 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 1 with value: 0.1505457282649605.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.76it/s]


** End of epoch, accumulated average loss = 5.037618 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.33it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.06it/s]
[32m[I 2023-04-11 08:46:25,873][0m Trial 2 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 1 with value: 0.1505457282649605.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.32it/s]


** End of epoch, accumulated average loss = 4.941776 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.31it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.05it/s]
[32m[I 2023-04-11 08:46:28,693][0m Trial 3 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 1 with value: 0.1505457282649605.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.53it/s]


** End of epoch, accumulated average loss = 5.122521 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.60it/s]
[32m[I 2023-04-11 08:46:31,874][0m Trial 4 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 1 with value: 0.1505457282649605.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.33it/s]


** End of epoch, accumulated average loss = 4.844546 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.23it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.14it/s]
[32m[I 2023-04-11 08:46:35,251][0m Trial 5 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 1 with value: 0.1505457282649605.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.30it/s]


** End of epoch, accumulated average loss = 4.941261 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]
[32m[I 2023-04-11 08:46:38,106][0m Trial 6 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 1 with value: 0.1505457282649605.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.25it/s]


** End of epoch, accumulated average loss = 5.115329 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.09it/s]
[32m[I 2023-04-11 08:46:41,012][0m Trial 7 finished with value: 0.15106881184379484 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 7 with value: 0.15106881184379484.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.63it/s]


** End of epoch, accumulated average loss = 5.321832 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.28it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:46:43,919][0m Trial 8 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 7 with value: 0.15106881184379484.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 15.86it/s]


** End of epoch, accumulated average loss = 4.772420 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.22it/s]
[32m[I 2023-04-11 08:46:47,633][0m Trial 9 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.15106881184379484.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 35.17it/s]


** End of epoch, accumulated average loss = 5.011247 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.27it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.36it/s]
[32m[I 2023-04-11 08:46:50,408][0m Trial 10 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 7 with value: 0.15106881184379484.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.64it/s]


** End of epoch, accumulated average loss = 4.956176 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.06it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.24it/s]
[32m[I 2023-04-11 08:46:53,237][0m Trial 11 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 7 with value: 0.15106881184379484.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.93it/s]


** End of epoch, accumulated average loss = 4.732324 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.09it/s]
[32m[I 2023-04-11 08:46:57,558][0m Trial 12 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 7 with value: 0.15106881184379484.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 12.23it/s]


** End of epoch, accumulated average loss = 5.184418 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.22it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 08:47:01,677][0m Trial 13 finished with value: 0.16038492381716118 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 13 with value: 0.16038492381716118.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.53it/s]


** End of epoch, accumulated average loss = 4.799354 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.13it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.15it/s]
[32m[I 2023-04-11 08:47:05,239][0m Trial 14 finished with value: 0.17022725338326666 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.26it/s]


** End of epoch, accumulated average loss = 5.112024 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 08:47:08,451][0m Trial 15 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.27it/s]


** End of epoch, accumulated average loss = 4.853657 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.13it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.03it/s]
[32m[I 2023-04-11 08:47:12,125][0m Trial 16 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.45it/s]


** End of epoch, accumulated average loss = 4.923136 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.19it/s]
[32m[I 2023-04-11 08:47:15,002][0m Trial 17 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.50it/s]


** End of epoch, accumulated average loss = 4.772950 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.15it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 08:47:17,875][0m Trial 18 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.34it/s]


** End of epoch, accumulated average loss = 4.839410 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.09it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]
[32m[I 2023-04-11 08:47:20,753][0m Trial 19 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.31it/s]


** End of epoch, accumulated average loss = 5.026824 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.82it/s]
[32m[I 2023-04-11 08:47:24,303][0m Trial 20 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 27.85it/s]


** End of epoch, accumulated average loss = 4.819999 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.11it/s]
[32m[I 2023-04-11 08:47:27,268][0m Trial 21 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.63it/s]


** End of epoch, accumulated average loss = 5.414279 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]
[32m[I 2023-04-11 08:47:30,280][0m Trial 22 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.86it/s]


** End of epoch, accumulated average loss = 4.837844 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.11it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.10it/s]
[32m[I 2023-04-11 08:47:33,141][0m Trial 23 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.07it/s]


** End of epoch, accumulated average loss = 4.980338 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.87it/s]
[32m[I 2023-04-11 08:47:36,586][0m Trial 24 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.03it/s]


** End of epoch, accumulated average loss = 5.226537 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 08:47:39,679][0m Trial 25 finished with value: 0.15151515151515152 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.08it/s]


** End of epoch, accumulated average loss = 4.805670 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.17it/s]
[32m[I 2023-04-11 08:47:42,566][0m Trial 26 finished with value: 0.16003840921821239 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.49it/s]


** End of epoch, accumulated average loss = 4.930459 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 08:47:45,450][0m Trial 27 finished with value: 0.15550890288469013 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.89it/s]


** End of epoch, accumulated average loss = 4.604961 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.06it/s]
[32m[I 2023-04-11 08:47:48,750][0m Trial 28 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.10it/s]


** End of epoch, accumulated average loss = 4.938598 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]
[32m[I 2023-04-11 08:47:51,998][0m Trial 29 finished with value: 0.15342129487572875 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.80it/s]


** End of epoch, accumulated average loss = 4.915708 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]
[32m[I 2023-04-11 08:47:54,888][0m Trial 30 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.77it/s]


** End of epoch, accumulated average loss = 4.825533 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]
[32m[I 2023-04-11 08:47:57,827][0m Trial 31 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.86it/s]


** End of epoch, accumulated average loss = 4.729928 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.11it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.39it/s]
[32m[I 2023-04-11 08:48:01,016][0m Trial 32 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.40it/s]


** End of epoch, accumulated average loss = 5.191686 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.23it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]
[32m[I 2023-04-11 08:48:04,437][0m Trial 33 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.46it/s]


** End of epoch, accumulated average loss = 5.230182 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]
[32m[I 2023-04-11 08:48:07,435][0m Trial 34 finished with value: 0.15071590052750566 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.64it/s]


** End of epoch, accumulated average loss = 5.052619 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.14it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 08:48:10,320][0m Trial 35 finished with value: 0.15065913370998116 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.35it/s]


** End of epoch, accumulated average loss = 4.930159 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.77it/s]
[32m[I 2023-04-11 08:48:13,440][0m Trial 36 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.78it/s]


** End of epoch, accumulated average loss = 5.280439 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.14it/s]
[32m[I 2023-04-11 08:48:16,981][0m Trial 37 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.07it/s]


** End of epoch, accumulated average loss = 4.870347 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]
[32m[I 2023-04-11 08:48:19,855][0m Trial 38 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.04it/s]


** End of epoch, accumulated average loss = 4.755529 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]
[32m[I 2023-04-11 08:48:22,750][0m Trial 39 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.63it/s]


** End of epoch, accumulated average loss = 5.172005 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.32it/s]
[32m[I 2023-04-11 08:48:25,752][0m Trial 40 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.63it/s]


** End of epoch, accumulated average loss = 5.175623 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:48:29,438][0m Trial 41 finished with value: 0.15629884338855893 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.23it/s]


** End of epoch, accumulated average loss = 4.815588 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]
[32m[I 2023-04-11 08:48:32,350][0m Trial 42 finished with value: 0.16723806338322603 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.26it/s]


** End of epoch, accumulated average loss = 5.177668 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:48:35,292][0m Trial 43 finished with value: 0.1506931886678722 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.25it/s]


** End of epoch, accumulated average loss = 5.020109 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]
[32m[I 2023-04-11 08:48:38,173][0m Trial 44 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.04it/s]


** End of epoch, accumulated average loss = 4.964528 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.04it/s]
[32m[I 2023-04-11 08:48:41,847][0m Trial 45 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.37it/s]


** End of epoch, accumulated average loss = 5.047940 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]
[32m[I 2023-04-11 08:48:44,740][0m Trial 46 finished with value: 0.15834059061040298 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.67it/s]


** End of epoch, accumulated average loss = 5.090206 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 08:48:47,830][0m Trial 47 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.55it/s]


** End of epoch, accumulated average loss = 4.834914 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.17it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.07it/s]
[32m[I 2023-04-11 08:48:50,682][0m Trial 48 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 28.59it/s]


** End of epoch, accumulated average loss = 4.782159 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.69it/s]
[32m[I 2023-04-11 08:48:54,323][0m Trial 49 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.35it/s]


** End of epoch, accumulated average loss = 4.688870 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.06it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 08:48:57,270][0m Trial 50 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.04it/s]


** End of epoch, accumulated average loss = 4.951178 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]
[32m[I 2023-04-11 08:49:00,171][0m Trial 51 finished with value: 0.15315108354391607 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.27it/s]


** End of epoch, accumulated average loss = 4.979607 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.07it/s]
[32m[I 2023-04-11 08:49:03,073][0m Trial 52 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.06it/s]


** End of epoch, accumulated average loss = 5.009273 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.03it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.96it/s]
[32m[I 2023-04-11 08:49:06,801][0m Trial 53 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.23it/s]


** End of epoch, accumulated average loss = 5.320661 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.02it/s]
[32m[I 2023-04-11 08:49:09,708][0m Trial 54 finished with value: 0.16358580075249468 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.43it/s]


** End of epoch, accumulated average loss = 4.669097 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.20it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]
[32m[I 2023-04-11 08:49:12,571][0m Trial 55 finished with value: 0.15110305228165608 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.02it/s]


** End of epoch, accumulated average loss = 4.821678 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.11it/s]
[32m[I 2023-04-11 08:49:15,481][0m Trial 56 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.06it/s]


** End of epoch, accumulated average loss = 5.220488 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.67it/s]
[32m[I 2023-04-11 08:49:19,203][0m Trial 57 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.95it/s]


** End of epoch, accumulated average loss = 4.937386 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 08:49:22,125][0m Trial 58 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.02it/s]


** End of epoch, accumulated average loss = 4.992087 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.18it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.18it/s]
[32m[I 2023-04-11 08:49:24,951][0m Trial 59 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.99it/s]


** End of epoch, accumulated average loss = 4.897258 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 08:49:27,880][0m Trial 60 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.75it/s]


** End of epoch, accumulated average loss = 4.851547 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.10it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.89it/s]
[32m[I 2023-04-11 08:49:31,495][0m Trial 61 finished with value: 0.16645859342488556 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.08it/s]


** End of epoch, accumulated average loss = 5.204629 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.14it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.15it/s]
[32m[I 2023-04-11 08:49:34,483][0m Trial 62 finished with value: 0.15216068167985392 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.01it/s]


** End of epoch, accumulated average loss = 4.968283 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.23it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.18it/s]
[32m[I 2023-04-11 08:49:37,320][0m Trial 63 finished with value: 0.15331544653123802 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.96it/s]


** End of epoch, accumulated average loss = 4.961377 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.14it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 08:49:40,217][0m Trial 64 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.98it/s]


** End of epoch, accumulated average loss = 5.111719 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.19it/s]
[32m[I 2023-04-11 08:49:43,480][0m Trial 65 finished with value: 0.1543448062972681 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.35it/s]


** End of epoch, accumulated average loss = 4.882999 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 08:49:46,767][0m Trial 66 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.07it/s]


** End of epoch, accumulated average loss = 5.308059 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.14it/s]
[32m[I 2023-04-11 08:49:49,671][0m Trial 67 finished with value: 0.16079755587715067 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.81it/s]


** End of epoch, accumulated average loss = 5.239037 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.07it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.13it/s]
[32m[I 2023-04-11 08:49:52,541][0m Trial 68 finished with value: 0.15340952673161 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.91it/s]


** End of epoch, accumulated average loss = 4.734454 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.78it/s]
[32m[I 2023-04-11 08:49:55,649][0m Trial 69 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.81it/s]


** End of epoch, accumulated average loss = 5.063038 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.16it/s]
[32m[I 2023-04-11 08:49:59,129][0m Trial 70 finished with value: 0.15096618357487923 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.05it/s]


** End of epoch, accumulated average loss = 5.096683 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:50:02,066][0m Trial 71 finished with value: 0.15160703456640387 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.47it/s]


** End of epoch, accumulated average loss = 4.776563 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.02it/s]
[32m[I 2023-04-11 08:50:04,958][0m Trial 72 finished with value: 0.15067048365225252 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.88it/s]


** End of epoch, accumulated average loss = 4.684341 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.80it/s]
[32m[I 2023-04-11 08:50:08,221][0m Trial 73 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.56it/s]


** End of epoch, accumulated average loss = 4.812262 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]
[32m[I 2023-04-11 08:50:11,831][0m Trial 74 finished with value: 0.16196954972465175 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.69it/s]


** End of epoch, accumulated average loss = 4.888895 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 08:50:14,769][0m Trial 75 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.50it/s]


** End of epoch, accumulated average loss = 5.463786 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:50:17,730][0m Trial 76 finished with value: 0.06591957811470006 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.74it/s]


** End of epoch, accumulated average loss = 4.900457 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.92it/s]
[32m[I 2023-04-11 08:50:20,846][0m Trial 77 finished with value: 0.1576789656259855 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.49it/s]


** End of epoch, accumulated average loss = 4.822956 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]
[32m[I 2023-04-11 08:50:24,457][0m Trial 78 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.53it/s]


** End of epoch, accumulated average loss = 5.297312 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]
[32m[I 2023-04-11 08:50:27,400][0m Trial 79 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.03it/s]


** End of epoch, accumulated average loss = 4.931353 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:50:30,382][0m Trial 80 finished with value: 0.15063643895458312 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.74it/s]


** End of epoch, accumulated average loss = 4.781371 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.37it/s]
[32m[I 2023-04-11 08:50:33,415][0m Trial 81 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.55it/s]


** End of epoch, accumulated average loss = 4.688607 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.52it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:50:37,085][0m Trial 82 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.39it/s]


** End of epoch, accumulated average loss = 4.778580 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 08:50:40,065][0m Trial 83 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.29it/s]


** End of epoch, accumulated average loss = 5.540145 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]
[32m[I 2023-04-11 08:50:42,993][0m Trial 84 finished with value: 0.15144631228229594 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.76it/s]


** End of epoch, accumulated average loss = 4.815608 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.15it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]
[32m[I 2023-04-11 08:50:45,877][0m Trial 85 finished with value: 0.1507386192342478 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.02it/s]


** End of epoch, accumulated average loss = 4.578864 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.35it/s]
[32m[I 2023-04-11 08:50:49,801][0m Trial 86 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.76it/s]


** End of epoch, accumulated average loss = 4.932938 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.10it/s]
[32m[I 2023-04-11 08:50:52,718][0m Trial 87 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.41it/s]


** End of epoch, accumulated average loss = 4.829871 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.20it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 08:50:55,585][0m Trial 88 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.16it/s]


** End of epoch, accumulated average loss = 4.764888 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:50:58,504][0m Trial 89 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.57it/s]


** End of epoch, accumulated average loss = 5.369987 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.70it/s]
[32m[I 2023-04-11 08:51:02,278][0m Trial 90 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.25it/s]


** End of epoch, accumulated average loss = 5.068149 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.10it/s]
[32m[I 2023-04-11 08:51:05,177][0m Trial 91 finished with value: 0.14592149423610098 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.04it/s]


** End of epoch, accumulated average loss = 5.029466 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 08:51:08,069][0m Trial 92 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.81it/s]


** End of epoch, accumulated average loss = 4.835004 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 08:51:11,020][0m Trial 93 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.54it/s]


** End of epoch, accumulated average loss = 5.214710 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.24it/s]
[32m[I 2023-04-11 08:51:14,743][0m Trial 94 finished with value: 0.15121729925903524 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.41it/s]


** End of epoch, accumulated average loss = 4.912413 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]
[32m[I 2023-04-11 08:51:17,659][0m Trial 95 finished with value: 0.13202191563799592 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.95it/s]


** End of epoch, accumulated average loss = 4.822257 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:51:20,620][0m Trial 96 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.78it/s]


** End of epoch, accumulated average loss = 4.768421 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.02it/s]
[32m[I 2023-04-11 08:51:23,529][0m Trial 97 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.10it/s]


** End of epoch, accumulated average loss = 4.848200 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.05it/s]
[32m[I 2023-04-11 08:51:27,400][0m Trial 98 finished with value: 0.15080681646810434 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.09it/s]


** End of epoch, accumulated average loss = 4.802568 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 08:51:30,375][0m Trial 99 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.21it/s]


** End of epoch, accumulated average loss = 5.195605 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.07it/s]
[32m[I 2023-04-11 08:51:33,294][0m Trial 100 finished with value: 0.15100037750094375 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.73it/s]


** End of epoch, accumulated average loss = 4.864673 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]
[32m[I 2023-04-11 08:51:36,224][0m Trial 101 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.98it/s]


** End of epoch, accumulated average loss = 4.736954 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.19it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.76it/s]
[32m[I 2023-04-11 08:51:39,991][0m Trial 102 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.74it/s]


** End of epoch, accumulated average loss = 4.877045 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:51:42,952][0m Trial 103 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.31it/s]


** End of epoch, accumulated average loss = 5.502030 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]
[32m[I 2023-04-11 08:51:45,903][0m Trial 104 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.11it/s]


** End of epoch, accumulated average loss = 4.975680 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 08:51:48,843][0m Trial 105 finished with value: 0.1519872330724219 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.38it/s]


** End of epoch, accumulated average loss = 5.259072 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.56it/s]
[32m[I 2023-04-11 08:51:52,571][0m Trial 106 finished with value: 0.15138899402013473 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.32it/s]


** End of epoch, accumulated average loss = 4.952075 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]
[32m[I 2023-04-11 08:51:55,544][0m Trial 107 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.32it/s]


** End of epoch, accumulated average loss = 4.610661 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 08:51:58,508][0m Trial 108 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.66it/s]


** End of epoch, accumulated average loss = 5.051919 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 08:52:01,496][0m Trial 109 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.30it/s]


** End of epoch, accumulated average loss = 5.238153 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.58it/s]
[32m[I 2023-04-11 08:52:05,377][0m Trial 110 finished with value: 0.1520450053215752 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.05it/s]


** End of epoch, accumulated average loss = 4.840401 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]
[32m[I 2023-04-11 08:52:08,353][0m Trial 111 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.76it/s]


** End of epoch, accumulated average loss = 4.763509 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 08:52:11,279][0m Trial 112 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.19it/s]


** End of epoch, accumulated average loss = 4.786605 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 08:52:14,231][0m Trial 113 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.56it/s]


** End of epoch, accumulated average loss = 4.914127 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.73it/s]
[32m[I 2023-04-11 08:52:17,808][0m Trial 114 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.46it/s]


** End of epoch, accumulated average loss = 5.191763 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 08:52:21,009][0m Trial 115 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.22it/s]


** End of epoch, accumulated average loss = 4.902241 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:52:23,994][0m Trial 116 finished with value: 0.1511144692104269 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.08it/s]


** End of epoch, accumulated average loss = 5.051028 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:52:26,993][0m Trial 117 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.57it/s]


** End of epoch, accumulated average loss = 4.893464 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.22it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.86it/s]
[32m[I 2023-04-11 08:52:30,477][0m Trial 118 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.97it/s]


** End of epoch, accumulated average loss = 4.898426 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:52:33,699][0m Trial 119 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.44it/s]


** End of epoch, accumulated average loss = 5.298341 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:52:36,654][0m Trial 120 finished with value: 0.15108022359873094 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.69it/s]


** End of epoch, accumulated average loss = 5.175346 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 08:52:39,636][0m Trial 121 finished with value: 0.15152663080536405 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.14it/s]


** End of epoch, accumulated average loss = 4.852614 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.32it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.88it/s]
[32m[I 2023-04-11 08:52:43,234][0m Trial 122 finished with value: 0.15779092702169625 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.10it/s]


** End of epoch, accumulated average loss = 4.587699 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.34it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]
[32m[I 2023-04-11 08:52:46,480][0m Trial 123 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.03it/s]


** End of epoch, accumulated average loss = 4.853599 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 08:52:49,449][0m Trial 124 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.80it/s]


** End of epoch, accumulated average loss = 4.655355 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]
[32m[I 2023-04-11 08:52:52,402][0m Trial 125 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.34it/s]


** End of epoch, accumulated average loss = 4.894744 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.77it/s]
[32m[I 2023-04-11 08:52:55,890][0m Trial 126 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.83it/s]


** End of epoch, accumulated average loss = 4.856772 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 08:52:59,194][0m Trial 127 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.80it/s]


** End of epoch, accumulated average loss = 4.914878 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 08:53:02,207][0m Trial 128 finished with value: 0.15081818867355404 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.25it/s]


** End of epoch, accumulated average loss = 4.705536 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 08:53:05,241][0m Trial 129 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.87it/s]


** End of epoch, accumulated average loss = 4.659956 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.72it/s]
[32m[I 2023-04-11 08:53:08,648][0m Trial 130 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.67it/s]


** End of epoch, accumulated average loss = 4.666141 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 08:53:12,000][0m Trial 131 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.87it/s]


** End of epoch, accumulated average loss = 4.873417 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:53:14,964][0m Trial 132 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.41it/s]


** End of epoch, accumulated average loss = 5.150847 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 08:53:17,951][0m Trial 133 finished with value: 0.15338599585857812 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.49it/s]


** End of epoch, accumulated average loss = 4.776486 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.98it/s]
[32m[I 2023-04-11 08:53:21,300][0m Trial 134 finished with value: 0.15195259079167298 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.41it/s]


** End of epoch, accumulated average loss = 4.915471 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 08:53:24,861][0m Trial 135 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.73it/s]


** End of epoch, accumulated average loss = 4.703945 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 08:53:27,858][0m Trial 136 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.69it/s]


** End of epoch, accumulated average loss = 5.408345 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 08:53:30,860][0m Trial 137 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.77it/s]


** End of epoch, accumulated average loss = 5.049241 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.74it/s]
[32m[I 2023-04-11 08:53:34,271][0m Trial 138 finished with value: 0.15064778547755348 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.56it/s]


** End of epoch, accumulated average loss = 4.867429 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:53:37,655][0m Trial 139 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.38it/s]


** End of epoch, accumulated average loss = 5.114970 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 08:53:40,586][0m Trial 140 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.75it/s]


** End of epoch, accumulated average loss = 4.688114 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]
[32m[I 2023-04-11 08:53:43,533][0m Trial 141 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.99it/s]


** End of epoch, accumulated average loss = 4.785947 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.05it/s]
[32m[I 2023-04-11 08:53:46,849][0m Trial 142 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.13it/s]


** End of epoch, accumulated average loss = 5.002072 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.19it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:53:50,302][0m Trial 143 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.97it/s]


** End of epoch, accumulated average loss = 4.845285 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:53:53,284][0m Trial 144 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.60it/s]


** End of epoch, accumulated average loss = 4.967612 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:53:56,271][0m Trial 145 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.15it/s]


** End of epoch, accumulated average loss = 4.812292 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.37it/s]
[32m[I 2023-04-11 08:53:59,498][0m Trial 146 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.22it/s]


** End of epoch, accumulated average loss = 5.110003 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.08it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:54:03,176][0m Trial 147 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.42it/s]


** End of epoch, accumulated average loss = 5.243944 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 08:54:06,142][0m Trial 148 finished with value: 0.15074998115625235 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.88it/s]


** End of epoch, accumulated average loss = 4.993759 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 08:54:09,160][0m Trial 149 finished with value: 0.15303389700818731 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.42it/s]


** End of epoch, accumulated average loss = 4.864901 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.23it/s]
[32m[I 2023-04-11 08:54:12,462][0m Trial 150 finished with value: 0.15067048365225252 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.39it/s]


** End of epoch, accumulated average loss = 4.730185 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.17it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 08:54:15,952][0m Trial 151 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.05it/s]


** End of epoch, accumulated average loss = 5.063589 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.34it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 08:54:19,031][0m Trial 152 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.96it/s]


** End of epoch, accumulated average loss = 5.002287 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:54:22,030][0m Trial 153 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.37it/s]


** End of epoch, accumulated average loss = 4.730037 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.12it/s]
[32m[I 2023-04-11 08:54:25,345][0m Trial 154 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.04it/s]


** End of epoch, accumulated average loss = 4.874549 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.13it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:54:28,826][0m Trial 155 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.38it/s]


** End of epoch, accumulated average loss = 4.972502 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:54:31,817][0m Trial 156 finished with value: 0.16760244699572613 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.88it/s]


** End of epoch, accumulated average loss = 4.865043 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 08:54:34,763][0m Trial 157 finished with value: 0.15118300703000984 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.09it/s]


** End of epoch, accumulated average loss = 4.768483 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.18it/s]
[32m[I 2023-04-11 08:54:38,069][0m Trial 158 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.10it/s]


** End of epoch, accumulated average loss = 4.858830 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.06it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:54:41,622][0m Trial 159 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.40it/s]


** End of epoch, accumulated average loss = 4.919930 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:54:44,706][0m Trial 160 finished with value: 0.15186028853454822 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.16it/s]


** End of epoch, accumulated average loss = 4.939056 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:54:47,700][0m Trial 161 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.70it/s]


** End of epoch, accumulated average loss = 4.611011 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.41it/s]
[32m[I 2023-04-11 08:54:50,937][0m Trial 162 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.55it/s]


** End of epoch, accumulated average loss = 4.671648 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 08:54:54,456][0m Trial 163 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.01it/s]


** End of epoch, accumulated average loss = 4.844636 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 08:54:57,480][0m Trial 164 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.27it/s]


** End of epoch, accumulated average loss = 4.945413 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 08:55:00,488][0m Trial 165 finished with value: 0.15790304752881731 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.13it/s]


** End of epoch, accumulated average loss = 4.775928 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.50it/s]
[32m[I 2023-04-11 08:55:03,724][0m Trial 166 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.56it/s]


** End of epoch, accumulated average loss = 5.073769 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 08:55:07,288][0m Trial 167 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.25it/s]


** End of epoch, accumulated average loss = 5.052515 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 08:55:10,294][0m Trial 168 finished with value: 0.1517565824417634 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.47it/s]


** End of epoch, accumulated average loss = 4.902291 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:55:13,258][0m Trial 169 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.70it/s]


** End of epoch, accumulated average loss = 5.189581 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.70it/s]
[32m[I 2023-04-11 08:55:16,479][0m Trial 170 finished with value: 0.15154959460483444 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.14it/s]


** End of epoch, accumulated average loss = 4.914072 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 08:55:20,122][0m Trial 171 finished with value: 0.15067048365225252 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.37it/s]


** End of epoch, accumulated average loss = 4.736298 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:55:23,104][0m Trial 172 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.48it/s]


** End of epoch, accumulated average loss = 4.639923 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:55:26,233][0m Trial 173 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.85it/s]


** End of epoch, accumulated average loss = 4.760831 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.64it/s]
[32m[I 2023-04-11 08:55:29,452][0m Trial 174 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.68it/s]


** End of epoch, accumulated average loss = 5.035582 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:55:33,102][0m Trial 175 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.67it/s]


** End of epoch, accumulated average loss = 4.944996 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 08:55:36,114][0m Trial 176 finished with value: 0.15527950310559005 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.04it/s]


** End of epoch, accumulated average loss = 5.004060 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 08:55:39,126][0m Trial 177 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.61it/s]


** End of epoch, accumulated average loss = 4.831731 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.49it/s]
[32m[I 2023-04-11 08:55:42,384][0m Trial 178 finished with value: 0.15096618357487923 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.42it/s]


** End of epoch, accumulated average loss = 4.839657 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:55:46,037][0m Trial 179 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.85it/s]


** End of epoch, accumulated average loss = 4.997652 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:55:49,032][0m Trial 180 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.18it/s]


** End of epoch, accumulated average loss = 5.123993 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 08:55:52,001][0m Trial 181 finished with value: 0.15063643895458312 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.60it/s]


** End of epoch, accumulated average loss = 4.990385 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.55it/s]
[32m[I 2023-04-11 08:55:55,232][0m Trial 182 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.90it/s]


** End of epoch, accumulated average loss = 5.116264 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:55:58,824][0m Trial 183 finished with value: 0.1518141794443601 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.75it/s]


** End of epoch, accumulated average loss = 4.781740 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]
[32m[I 2023-04-11 08:56:01,820][0m Trial 184 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.26it/s]


** End of epoch, accumulated average loss = 5.056191 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 08:56:04,793][0m Trial 185 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.51it/s]


** End of epoch, accumulated average loss = 5.057864 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.56it/s]
[32m[I 2023-04-11 08:56:08,062][0m Trial 186 finished with value: 0.15060240963855423 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.03it/s]


** End of epoch, accumulated average loss = 4.905300 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.05it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]
[32m[I 2023-04-11 08:56:11,705][0m Trial 187 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.96it/s]


** End of epoch, accumulated average loss = 4.715255 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]
[32m[I 2023-04-11 08:56:14,686][0m Trial 188 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.04it/s]


** End of epoch, accumulated average loss = 4.812244 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 08:56:17,675][0m Trial 189 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.93it/s]


** End of epoch, accumulated average loss = 4.716597 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.06it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.55it/s]
[32m[I 2023-04-11 08:56:20,888][0m Trial 190 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.95it/s]


** End of epoch, accumulated average loss = 4.909597 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]
[32m[I 2023-04-11 08:56:24,448][0m Trial 191 finished with value: 0.15064778547755348 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.26it/s]


** End of epoch, accumulated average loss = 4.807302 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:56:27,448][0m Trial 192 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.59it/s]


** End of epoch, accumulated average loss = 4.863421 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 08:56:30,453][0m Trial 193 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.70it/s]


** End of epoch, accumulated average loss = 4.864184 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.91it/s]
[32m[I 2023-04-11 08:56:33,584][0m Trial 194 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.99it/s]


** End of epoch, accumulated average loss = 5.115672 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 08:56:37,241][0m Trial 195 finished with value: 0.15601841017240034 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.59it/s]


** End of epoch, accumulated average loss = 4.973283 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:56:40,273][0m Trial 196 finished with value: 0.1511144692104269 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.75it/s]


** End of epoch, accumulated average loss = 5.062398 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 08:56:43,272][0m Trial 197 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.43it/s]


** End of epoch, accumulated average loss = 4.831515 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.02it/s]
[32m[I 2023-04-11 08:56:46,412][0m Trial 198 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 16.99it/s]


** End of epoch, accumulated average loss = 4.820602 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 08:56:50,204][0m Trial 199 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.67it/s]


** End of epoch, accumulated average loss = 4.823825 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:56:53,182][0m Trial 200 finished with value: 0.15102318205844598 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.73it/s]


** End of epoch, accumulated average loss = 4.716480 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:56:56,129][0m Trial 201 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.95it/s]


** End of epoch, accumulated average loss = 4.794116 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.92it/s]
[32m[I 2023-04-11 08:56:59,280][0m Trial 202 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.00it/s]


** End of epoch, accumulated average loss = 4.880626 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 08:57:02,895][0m Trial 203 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.22it/s]


** End of epoch, accumulated average loss = 4.777667 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:57:05,911][0m Trial 204 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.66it/s]


** End of epoch, accumulated average loss = 4.784095 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 08:57:09,000][0m Trial 205 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.99it/s]


** End of epoch, accumulated average loss = 4.942331 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.98it/s]
[32m[I 2023-04-11 08:57:12,207][0m Trial 206 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.22it/s]


** End of epoch, accumulated average loss = 4.885758 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.35it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 08:57:15,981][0m Trial 207 finished with value: 0.15064778547755348 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.20it/s]


** End of epoch, accumulated average loss = 5.305486 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 08:57:19,033][0m Trial 208 finished with value: 0.1519756838905775 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.70it/s]


** End of epoch, accumulated average loss = 4.691903 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 08:57:22,092][0m Trial 209 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.05it/s]


** End of epoch, accumulated average loss = 4.840068 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.64it/s]
[32m[I 2023-04-11 08:57:25,333][0m Trial 210 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.84it/s]


** End of epoch, accumulated average loss = 5.102807 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 08:57:29,042][0m Trial 211 finished with value: 0.15376335819174292 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.45it/s]


** End of epoch, accumulated average loss = 4.761552 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 08:57:32,066][0m Trial 212 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.81it/s]


** End of epoch, accumulated average loss = 4.954339 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 08:57:35,189][0m Trial 213 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.25it/s]


** End of epoch, accumulated average loss = 4.795117 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.39it/s]
[32m[I 2023-04-11 08:57:38,471][0m Trial 214 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.78it/s]


** End of epoch, accumulated average loss = 5.002930 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 08:57:42,132][0m Trial 215 finished with value: 0.1603077909586406 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.35it/s]


** End of epoch, accumulated average loss = 4.857963 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 08:57:45,193][0m Trial 216 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.84it/s]


** End of epoch, accumulated average loss = 4.763137 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]
[32m[I 2023-04-11 08:57:48,271][0m Trial 217 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.69it/s]


** End of epoch, accumulated average loss = 4.640261 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.41it/s]
[32m[I 2023-04-11 08:57:51,529][0m Trial 218 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.50it/s]


** End of epoch, accumulated average loss = 4.950306 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:57:55,145][0m Trial 219 finished with value: 0.1579778830963665 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.77it/s]


** End of epoch, accumulated average loss = 5.337518 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:57:58,149][0m Trial 220 finished with value: 0.15888147442008263 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.15it/s]


** End of epoch, accumulated average loss = 5.162904 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]
[32m[I 2023-04-11 08:58:01,112][0m Trial 221 finished with value: 0.15153811183512653 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.95it/s]


** End of epoch, accumulated average loss = 4.786291 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.55it/s]
[32m[I 2023-04-11 08:58:04,354][0m Trial 222 finished with value: 0.1512973749905439 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.87it/s]


** End of epoch, accumulated average loss = 5.000525 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:58:07,975][0m Trial 223 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.22it/s]


** End of epoch, accumulated average loss = 4.916005 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:58:10,989][0m Trial 224 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.65it/s]


** End of epoch, accumulated average loss = 4.821799 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 08:58:14,165][0m Trial 225 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.36it/s]


** End of epoch, accumulated average loss = 5.158507 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.41it/s]
[32m[I 2023-04-11 08:58:17,470][0m Trial 226 finished with value: 0.1574555188159345 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.88it/s]


** End of epoch, accumulated average loss = 5.144529 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 08:58:21,130][0m Trial 227 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.01it/s]


** End of epoch, accumulated average loss = 4.778846 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:58:24,178][0m Trial 228 finished with value: 0.15287013681877246 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.65it/s]


** End of epoch, accumulated average loss = 4.830304 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.33it/s]
[32m[I 2023-04-11 08:58:27,297][0m Trial 229 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.51it/s]


** End of epoch, accumulated average loss = 4.730544 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.25it/s]
[32m[I 2023-04-11 08:58:30,635][0m Trial 230 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.31it/s]


** End of epoch, accumulated average loss = 5.092002 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 08:58:34,246][0m Trial 231 finished with value: 0.15088645794039984 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.28it/s]


** End of epoch, accumulated average loss = 5.157276 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 08:58:37,304][0m Trial 232 finished with value: 0.15748031496062992 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.55it/s]


** End of epoch, accumulated average loss = 5.020666 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:58:40,370][0m Trial 233 finished with value: 0.1517105362967458 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.23it/s]


** End of epoch, accumulated average loss = 5.244075 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.20it/s]
[32m[I 2023-04-11 08:58:43,697][0m Trial 234 finished with value: 0.15140045420136258 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.23it/s]


** End of epoch, accumulated average loss = 4.730028 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 08:58:47,326][0m Trial 235 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.87it/s]


** End of epoch, accumulated average loss = 4.941213 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 08:58:50,388][0m Trial 236 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.84it/s]


** End of epoch, accumulated average loss = 4.883193 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 08:58:53,408][0m Trial 237 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.86it/s]


** End of epoch, accumulated average loss = 4.895248 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.00it/s]
[32m[I 2023-04-11 08:58:56,786][0m Trial 238 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.54it/s]


** End of epoch, accumulated average loss = 4.938042 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.17it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:59:00,447][0m Trial 239 finished with value: 0.1515610791148833 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.59it/s]


** End of epoch, accumulated average loss = 4.908312 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:59:03,507][0m Trial 240 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.09it/s]


** End of epoch, accumulated average loss = 4.777527 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:59:06,518][0m Trial 241 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.25it/s]


** End of epoch, accumulated average loss = 4.784386 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.89it/s]
[32m[I 2023-04-11 08:59:09,949][0m Trial 242 finished with value: 0.15157256536566882 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.27it/s]


** End of epoch, accumulated average loss = 4.798602 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.16it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:59:13,487][0m Trial 243 finished with value: 0.1521838380763963 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.26it/s]


** End of epoch, accumulated average loss = 5.117649 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 08:59:16,570][0m Trial 244 finished with value: 0.15284677111196024 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.43it/s]


** End of epoch, accumulated average loss = 4.842391 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.36it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:59:19,719][0m Trial 245 finished with value: 0.1538816649996153 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.86it/s]


** End of epoch, accumulated average loss = 4.771256 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.74it/s]
[32m[I 2023-04-11 08:59:23,219][0m Trial 246 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.59it/s]


** End of epoch, accumulated average loss = 5.117020 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 08:59:26,760][0m Trial 247 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.14it/s]


** End of epoch, accumulated average loss = 4.933852 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 08:59:29,879][0m Trial 248 finished with value: 0.15061375103546953 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.21it/s]


** End of epoch, accumulated average loss = 4.905195 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:59:32,922][0m Trial 249 finished with value: 0.15183723048891587 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.40it/s]


** End of epoch, accumulated average loss = 5.134903 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.22it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.83it/s]
[32m[I 2023-04-11 08:59:36,452][0m Trial 250 finished with value: 0.15081818867355404 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.65it/s]


** End of epoch, accumulated average loss = 4.718026 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 08:59:40,029][0m Trial 251 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.93it/s]


** End of epoch, accumulated average loss = 5.110925 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:59:43,059][0m Trial 252 finished with value: 0.15080681646810434 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.41it/s]


** End of epoch, accumulated average loss = 4.981578 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 08:59:46,155][0m Trial 253 finished with value: 0.1508523155830442 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.53it/s]


** End of epoch, accumulated average loss = 4.847614 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.94it/s]
[32m[I 2023-04-11 08:59:49,749][0m Trial 254 finished with value: 0.1586671955573185 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.77it/s]


** End of epoch, accumulated average loss = 5.140789 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.12it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:59:53,120][0m Trial 255 finished with value: 0.1513317191283293 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.67it/s]


** End of epoch, accumulated average loss = 4.786072 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 08:59:56,204][0m Trial 256 finished with value: 0.15239256324291373 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.02it/s]


** End of epoch, accumulated average loss = 4.821634 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 08:59:59,281][0m Trial 257 finished with value: 0.1516990291262136 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.01it/s]


** End of epoch, accumulated average loss = 4.767319 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.84it/s]
[32m[I 2023-04-11 09:00:02,889][0m Trial 258 finished with value: 0.1556541365086777 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.10it/s]


** End of epoch, accumulated average loss = 5.013436 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.22it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 09:00:06,239][0m Trial 259 finished with value: 0.15241579027587257 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.39it/s]


** End of epoch, accumulated average loss = 4.915187 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 09:00:09,302][0m Trial 260 finished with value: 0.16182539040375435 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 28.61it/s]


** End of epoch, accumulated average loss = 4.728769 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 09:00:12,378][0m Trial 261 finished with value: 0.15152663080536405 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.43it/s]


** End of epoch, accumulated average loss = 5.030726 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.58it/s]
[32m[I 2023-04-11 09:00:16,100][0m Trial 262 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.24it/s]


** End of epoch, accumulated average loss = 4.524482 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.22it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 09:00:19,389][0m Trial 263 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.83it/s]


** End of epoch, accumulated average loss = 4.870572 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 09:00:22,593][0m Trial 264 finished with value: 0.1591343093570974 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.07it/s]


** End of epoch, accumulated average loss = 4.923325 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 09:00:25,677][0m Trial 265 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.48it/s]


** End of epoch, accumulated average loss = 4.822202 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.54it/s]
[32m[I 2023-04-11 09:00:29,410][0m Trial 266 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.01it/s]


** End of epoch, accumulated average loss = 4.919358 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 09:00:32,663][0m Trial 267 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.40it/s]


** End of epoch, accumulated average loss = 5.349345 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 09:00:35,701][0m Trial 268 finished with value: 0.15082956259426847 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.63it/s]


** End of epoch, accumulated average loss = 5.130686 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]
[32m[I 2023-04-11 09:00:38,783][0m Trial 269 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.14it/s]


** End of epoch, accumulated average loss = 5.007814 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.78it/s]
[32m[I 2023-04-11 09:00:42,498][0m Trial 270 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.29it/s]


** End of epoch, accumulated average loss = 4.935053 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 09:00:45,676][0m Trial 271 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.50it/s]


** End of epoch, accumulated average loss = 4.939687 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]
[32m[I 2023-04-11 09:00:48,732][0m Trial 272 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.19it/s]


** End of epoch, accumulated average loss = 5.320506 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]
[32m[I 2023-04-11 09:00:51,834][0m Trial 273 finished with value: 0.15110305228165608 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.23it/s]


** End of epoch, accumulated average loss = 4.764928 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.69it/s]
[32m[I 2023-04-11 09:00:55,575][0m Trial 274 finished with value: 0.15060240963855423 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.86it/s]


** End of epoch, accumulated average loss = 5.092177 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 09:00:58,722][0m Trial 275 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.71it/s]


** End of epoch, accumulated average loss = 5.032091 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.16it/s]
[32m[I 2023-04-11 09:01:02,019][0m Trial 276 finished with value: 0.15234613040828762 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.37it/s]


** End of epoch, accumulated average loss = 4.576258 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 09:01:05,027][0m Trial 277 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.15it/s]


** End of epoch, accumulated average loss = 5.083223 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.31it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.44it/s]
[32m[I 2023-04-11 09:01:08,872][0m Trial 278 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.97it/s]


** End of epoch, accumulated average loss = 4.905997 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 09:01:12,024][0m Trial 279 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.64it/s]


** End of epoch, accumulated average loss = 4.816018 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]
[32m[I 2023-04-11 09:01:15,100][0m Trial 280 finished with value: 0.151894888736994 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.51it/s]


** End of epoch, accumulated average loss = 4.772669 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 09:01:18,133][0m Trial 281 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.94it/s]


** End of epoch, accumulated average loss = 4.766283 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.99it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.68it/s]
[32m[I 2023-04-11 09:01:22,014][0m Trial 282 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.20it/s]


** End of epoch, accumulated average loss = 4.886846 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 09:01:25,126][0m Trial 283 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.62it/s]


** End of epoch, accumulated average loss = 4.789894 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 09:01:28,188][0m Trial 284 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.71it/s]


** End of epoch, accumulated average loss = 4.696050 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 09:01:31,251][0m Trial 285 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.56it/s]


** End of epoch, accumulated average loss = 4.723658 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.30it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.51it/s]
[32m[I 2023-04-11 09:01:35,097][0m Trial 286 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.60it/s]


** End of epoch, accumulated average loss = 5.011805 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]
[32m[I 2023-04-11 09:01:38,181][0m Trial 287 finished with value: 0.1530690341343946 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.15it/s]


** End of epoch, accumulated average loss = 4.838512 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]
[32m[I 2023-04-11 09:01:41,239][0m Trial 288 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.68it/s]


** End of epoch, accumulated average loss = 4.757554 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 09:01:44,450][0m Trial 289 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.24it/s]


** End of epoch, accumulated average loss = 4.842198 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.78it/s]
[32m[I 2023-04-11 09:01:48,378][0m Trial 290 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.66it/s]


** End of epoch, accumulated average loss = 5.148114 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 09:01:51,452][0m Trial 291 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.42it/s]


** End of epoch, accumulated average loss = 4.771367 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 09:01:54,494][0m Trial 292 finished with value: 0.151894888736994 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.51it/s]


** End of epoch, accumulated average loss = 4.922982 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 09:01:57,545][0m Trial 293 finished with value: 0.15071590052750566 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.70it/s]


** End of epoch, accumulated average loss = 4.903846 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.08it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.75it/s]
[32m[I 2023-04-11 09:02:01,373][0m Trial 294 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.28it/s]


** End of epoch, accumulated average loss = 4.861244 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 09:02:04,409][0m Trial 295 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.05it/s]


** End of epoch, accumulated average loss = 5.044502 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 09:02:07,457][0m Trial 296 finished with value: 0.1506931886678722 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.38it/s]


** End of epoch, accumulated average loss = 4.916169 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 09:02:10,518][0m Trial 297 finished with value: 0.15245064410397136 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.37it/s]


** End of epoch, accumulated average loss = 5.146752 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.60it/s]
[32m[I 2023-04-11 09:02:14,415][0m Trial 298 finished with value: 0.16518004625041294 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.27it/s]


** End of epoch, accumulated average loss = 4.967829 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 09:02:17,460][0m Trial 299 finished with value: 0.15432098765432098 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.53it/s]


** End of epoch, accumulated average loss = 4.778613 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 09:02:20,572][0m Trial 300 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 14 with value: 0.17022725338326666.[0m


Best trial:
  Score: 0.170
  Params:
    dropout_embedding: 0.15
    dropout_attention: 0.1
    dropout_residual: 0.15
    dropout_relu: 0.2
    lr_peak: 0.0005


#### Smoothing with alpha = 0.05

In [None]:
import optuna
from sklearn.model_selection import train_test_split

# define the hyper-parameter space to search
def objective(trial):
    part2ixy = load_dataset(TRANSLIT_PATH, parts=SCORED_PARTS1)
    train_ids, train_strings, train_transliterations = part2ixy['train_small']
    val_ids, val_strings, val_transliterations = part2ixy['dev_small']
    dropout = {
            'embedding': trial.suggest_categorical('dropout_embedding', [0.1, 0.15, 0.2]),
            'attention': trial.suggest_categorical('dropout_attention', [0.1, 0.15, 0.2]),
            'residual': trial.suggest_categorical('dropout_residual', [0.1, 0.15, 0.2]),
            'relu': trial.suggest_categorical('dropout_relu', [0.1, 0.15, 0.2])
        }
    train_config = {
        'batch_size': 200, 'n_epochs': 1, 
        'lr_scheduler': {
        'type': 'warmup,decay_linear',
        'warmup_steps_part': 0.1,
            'lr_peak': trial.suggest_categorical('lr_peak', [3e-4, 5e-4, 1e-3, 2e-3]),
        },
    }
    
    # train the model with the current hyper-parameters
    learnable_params = train(train_strings, train_transliterations, 1, 0.05, train_config, dropout)
    for part, (ids, x, y) in part2ixy.items():
    # evaluate the predicted strings using the compute_metrics function
      preds = classify(y, learnable_params)
      metric_values = compute_metrics(np.squeeze(preds), y, ['mean_ld@1'])
    return 1/ metric_values['mean_ld@1'] 

# run the hyper-parameter search with Optuna
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=301)

# print the best hyper-parameter values and the corresponding objective score
print('Best trial:')
trial = study.best_trial
print(f'  Score: {trial.value:.3f}')
print('  Params:')
for key, value in trial.params.items():
    print(f'    {key}: {value}')

[32m[I 2023-04-11 09:02:20,603][0m A new study created in memory with name: no-name-66c11ee7-1d0f-4dbe-9b7d-a8187ea9a372[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.73it/s]


** End of epoch, accumulated average loss = 5.339468 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.17it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]
[32m[I 2023-04-11 09:02:23,461][0m Trial 0 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 0 with value: 0.1505343971097396.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.52it/s]


** End of epoch, accumulated average loss = 4.736740 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.10it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.98it/s]
[32m[I 2023-04-11 09:02:27,279][0m Trial 1 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 0 with value: 0.1505343971097396.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.25it/s]


** End of epoch, accumulated average loss = 4.755674 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 09:02:30,230][0m Trial 2 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 0 with value: 0.1505343971097396.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.67it/s]


** End of epoch, accumulated average loss = 4.628183 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.29it/s]
[32m[I 2023-04-11 09:02:33,062][0m Trial 3 finished with value: 0.15076134479119555 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 3 with value: 0.15076134479119555.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 35.33it/s]


** End of epoch, accumulated average loss = 4.856773 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.14it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]
[32m[I 2023-04-11 09:02:35,911][0m Trial 4 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 3 with value: 0.15076134479119555.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.56it/s]


** End of epoch, accumulated average loss = 4.649237 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.67it/s]
[32m[I 2023-04-11 09:02:39,492][0m Trial 5 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 3 with value: 0.15076134479119555.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 27.23it/s]


** End of epoch, accumulated average loss = 5.101476 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.06it/s]
[32m[I 2023-04-11 09:02:42,468][0m Trial 6 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 3 with value: 0.15076134479119555.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.42it/s]


** End of epoch, accumulated average loss = 4.782153 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.20it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 09:02:45,321][0m Trial 7 finished with value: 0.16427104722792607 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.51it/s]


** End of epoch, accumulated average loss = 4.923463 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.18it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.09it/s]
[32m[I 2023-04-11 09:02:48,169][0m Trial 8 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.49it/s]


** End of epoch, accumulated average loss = 5.010448 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.94it/s]
[32m[I 2023-04-11 09:02:51,556][0m Trial 9 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.59it/s]


** End of epoch, accumulated average loss = 5.119538 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.21it/s]
[32m[I 2023-04-11 09:02:54,670][0m Trial 10 finished with value: 0.1523113243469652 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.18it/s]


** End of epoch, accumulated average loss = 4.750794 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.18it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.23it/s]
[32m[I 2023-04-11 09:02:57,484][0m Trial 11 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.79it/s]


** End of epoch, accumulated average loss = 4.542244 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.07it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]
[32m[I 2023-04-11 09:03:00,357][0m Trial 12 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.88it/s]


** End of epoch, accumulated average loss = 4.440755 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.89it/s]
[32m[I 2023-04-11 09:03:03,450][0m Trial 13 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.80it/s]


** End of epoch, accumulated average loss = 4.979886 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.39it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]
[32m[I 2023-04-11 09:03:06,943][0m Trial 14 finished with value: 0.15946420028703556 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 35.42it/s]


** End of epoch, accumulated average loss = 4.677069 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.09it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]
[32m[I 2023-04-11 09:03:09,802][0m Trial 15 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.95it/s]


** End of epoch, accumulated average loss = 4.844576 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.21it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 09:03:12,664][0m Trial 16 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.38it/s]


** End of epoch, accumulated average loss = 4.440531 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.02it/s]
[32m[I 2023-04-11 09:03:15,714][0m Trial 17 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.79it/s]


** End of epoch, accumulated average loss = 4.649863 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 09:03:19,276][0m Trial 18 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.37it/s]


** End of epoch, accumulated average loss = 4.755506 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.16it/s]
[32m[I 2023-04-11 09:03:22,117][0m Trial 19 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.79it/s]


** End of epoch, accumulated average loss = 4.986295 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]
[32m[I 2023-04-11 09:03:24,991][0m Trial 20 finished with value: 0.15729453401494298 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.78it/s]


** End of epoch, accumulated average loss = 5.050720 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.20it/s]
[32m[I 2023-04-11 09:03:27,864][0m Trial 21 finished with value: 0.1509547890406823 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.16it/s]


** End of epoch, accumulated average loss = 4.704748 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.31it/s]
[32m[I 2023-04-11 09:03:31,570][0m Trial 22 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.22it/s]


** End of epoch, accumulated average loss = 4.720468 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.11it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 09:03:34,469][0m Trial 23 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.81it/s]


** End of epoch, accumulated average loss = 4.703739 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.13it/s]
[32m[I 2023-04-11 09:03:37,346][0m Trial 24 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.88it/s]


** End of epoch, accumulated average loss = 4.543221 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.12it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.06it/s]
[32m[I 2023-04-11 09:03:40,212][0m Trial 25 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.15it/s]


** End of epoch, accumulated average loss = 4.813992 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.34it/s]
[32m[I 2023-04-11 09:03:43,892][0m Trial 26 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.37it/s]


** End of epoch, accumulated average loss = 5.545086 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.05it/s]
[32m[I 2023-04-11 09:03:46,889][0m Trial 27 finished with value: 0.15232292460015232 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.05it/s]


** End of epoch, accumulated average loss = 4.736764 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.22it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]
[32m[I 2023-04-11 09:03:49,754][0m Trial 28 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.42it/s]


** End of epoch, accumulated average loss = 4.911936 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.20it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.23it/s]
[32m[I 2023-04-11 09:03:52,595][0m Trial 29 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.74it/s]


** End of epoch, accumulated average loss = 5.041110 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.33it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.83it/s]
[32m[I 2023-04-11 09:03:56,271][0m Trial 30 finished with value: 0.1629062474545899 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.71it/s]


** End of epoch, accumulated average loss = 4.771670 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.12it/s]
[32m[I 2023-04-11 09:03:59,142][0m Trial 31 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.05it/s]


** End of epoch, accumulated average loss = 4.642316 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 09:04:02,012][0m Trial 32 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.77it/s]


** End of epoch, accumulated average loss = 4.798611 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.16it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]
[32m[I 2023-04-11 09:04:04,864][0m Trial 33 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.56it/s]


** End of epoch, accumulated average loss = 4.617037 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.78it/s]
[32m[I 2023-04-11 09:04:08,395][0m Trial 34 finished with value: 0.16174686615446826 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.34it/s]


** End of epoch, accumulated average loss = 4.732104 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.05it/s]
[32m[I 2023-04-11 09:04:11,431][0m Trial 35 finished with value: 0.15061375103546953 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.50it/s]


** End of epoch, accumulated average loss = 5.036487 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 09:04:14,380][0m Trial 36 finished with value: 0.15621338748730765 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.70it/s]


** End of epoch, accumulated average loss = 4.739160 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.05it/s]
[32m[I 2023-04-11 09:04:17,309][0m Trial 37 finished with value: 0.15106881184379484 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.62it/s]


** End of epoch, accumulated average loss = 4.855769 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.90it/s]
[32m[I 2023-04-11 09:04:20,734][0m Trial 38 finished with value: 0.15177961599757153 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.52it/s]


** End of epoch, accumulated average loss = 4.812693 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.32it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.30it/s]
[32m[I 2023-04-11 09:04:23,880][0m Trial 39 finished with value: 0.15224175991474462 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.65it/s]


** End of epoch, accumulated average loss = 5.560936 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 09:04:26,954][0m Trial 40 finished with value: 0.07356998344675372 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.44it/s]


** End of epoch, accumulated average loss = 4.524821 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 09:04:29,918][0m Trial 41 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.63it/s]


** End of epoch, accumulated average loss = 4.756898 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.12it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.82it/s]
[32m[I 2023-04-11 09:04:33,263][0m Trial 42 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.76it/s]


** End of epoch, accumulated average loss = 5.122212 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.05it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.13it/s]
[32m[I 2023-04-11 09:04:36,478][0m Trial 43 finished with value: 0.16048788316482107 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.29it/s]


** End of epoch, accumulated average loss = 4.544121 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]
[32m[I 2023-04-11 09:04:39,422][0m Trial 44 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.74it/s]


** End of epoch, accumulated average loss = 4.514296 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.05it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.18it/s]
[32m[I 2023-04-11 09:04:42,270][0m Trial 45 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.48it/s]


** End of epoch, accumulated average loss = 4.664660 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.24it/s]
[32m[I 2023-04-11 09:04:45,516][0m Trial 46 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 7 with value: 0.16427104722792607.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.87it/s]


** End of epoch, accumulated average loss = 4.636410 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.24it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]
[32m[I 2023-04-11 09:04:48,944][0m Trial 47 finished with value: 0.16578249336870027 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.32it/s]


** End of epoch, accumulated average loss = 4.673857 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.10it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.05it/s]
[32m[I 2023-04-11 09:04:51,826][0m Trial 48 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.30it/s]


** End of epoch, accumulated average loss = 4.886476 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.14it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]
[32m[I 2023-04-11 09:04:54,687][0m Trial 49 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.41it/s]


** End of epoch, accumulated average loss = 4.776530 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.05it/s]
[32m[I 2023-04-11 09:04:57,749][0m Trial 50 finished with value: 0.15080681646810434 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.54it/s]


** End of epoch, accumulated average loss = 4.574873 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]
[32m[I 2023-04-11 09:05:01,297][0m Trial 51 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.84it/s]


** End of epoch, accumulated average loss = 5.366793 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 09:05:04,191][0m Trial 52 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.72it/s]


** End of epoch, accumulated average loss = 4.480910 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.10it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]
[32m[I 2023-04-11 09:05:07,172][0m Trial 53 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.25it/s]


** End of epoch, accumulated average loss = 4.808910 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 09:05:10,122][0m Trial 54 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.87it/s]


** End of epoch, accumulated average loss = 5.059136 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 09:05:13,810][0m Trial 55 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.35it/s]


** End of epoch, accumulated average loss = 4.758329 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 09:05:16,732][0m Trial 56 finished with value: 0.15305731996632738 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.25it/s]


** End of epoch, accumulated average loss = 4.749557 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 09:05:19,685][0m Trial 57 finished with value: 0.15149219815179518 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.57it/s]


** End of epoch, accumulated average loss = 4.687471 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 09:05:22,598][0m Trial 58 finished with value: 0.15257857796765334 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.92it/s]


** End of epoch, accumulated average loss = 5.101339 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.85it/s]
[32m[I 2023-04-11 09:05:26,306][0m Trial 59 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.47it/s]


** End of epoch, accumulated average loss = 4.685574 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 09:05:29,258][0m Trial 60 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.80it/s]


** End of epoch, accumulated average loss = 4.710881 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]
[32m[I 2023-04-11 09:05:32,181][0m Trial 61 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.52it/s]


** End of epoch, accumulated average loss = 4.534675 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.81it/s]
[32m[I 2023-04-11 09:05:36,956][0m Trial 62 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.24it/s]


** End of epoch, accumulated average loss = 4.668157 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.18it/s]
[32m[I 2023-04-11 09:05:40,264][0m Trial 63 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.90it/s]


** End of epoch, accumulated average loss = 4.767170 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.06it/s]
[32m[I 2023-04-11 09:05:43,971][0m Trial 64 finished with value: 0.1506931886678722 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.92it/s]


** End of epoch, accumulated average loss = 4.718720 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.23it/s]
[32m[I 2023-04-11 09:05:48,258][0m Trial 65 finished with value: 0.15305731996632738 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.55it/s]


** End of epoch, accumulated average loss = 4.711231 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 09:05:52,091][0m Trial 66 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.44it/s]


** End of epoch, accumulated average loss = 5.040735 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.93it/s]
[32m[I 2023-04-11 09:05:55,184][0m Trial 67 finished with value: 0.15160703456640387 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 18.96it/s]


** End of epoch, accumulated average loss = 4.627974 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.37it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 09:05:58,667][0m Trial 68 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.56it/s]


** End of epoch, accumulated average loss = 4.853809 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.60it/s]
[32m[I 2023-04-11 09:06:01,835][0m Trial 69 finished with value: 0.1525087692542321 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:01,  7.35it/s]


** End of epoch, accumulated average loss = 4.667407 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]
[32m[I 2023-04-11 09:06:07,352][0m Trial 70 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.14it/s]


** End of epoch, accumulated average loss = 4.581049 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 09:06:10,300][0m Trial 71 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.58it/s]


** End of epoch, accumulated average loss = 4.461436 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.25it/s]
[32m[I 2023-04-11 09:06:13,169][0m Trial 72 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.60it/s]


** End of epoch, accumulated average loss = 4.661310 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.15it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.79it/s]
[32m[I 2023-04-11 09:06:18,158][0m Trial 73 finished with value: 0.15092061575611226 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.27it/s]


** End of epoch, accumulated average loss = 4.635459 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]
[32m[I 2023-04-11 09:06:21,136][0m Trial 74 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.15it/s]


** End of epoch, accumulated average loss = 4.557170 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.31it/s]
[32m[I 2023-04-11 09:06:24,399][0m Trial 75 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.50it/s]


** End of epoch, accumulated average loss = 4.837221 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.32it/s]
[32m[I 2023-04-11 09:06:27,428][0m Trial 76 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.48it/s]


** End of epoch, accumulated average loss = 5.251555 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 09:06:31,043][0m Trial 77 finished with value: 0.15195259079167298 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.78it/s]


** End of epoch, accumulated average loss = 5.452305 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]
[32m[I 2023-04-11 09:06:33,994][0m Trial 78 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.35it/s]


** End of epoch, accumulated average loss = 5.059829 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 09:06:37,037][0m Trial 79 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.23it/s]


** End of epoch, accumulated average loss = 4.600059 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 09:06:40,006][0m Trial 80 finished with value: 0.15086369465188204 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.99it/s]


** End of epoch, accumulated average loss = 4.822363 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.33it/s]
[32m[I 2023-04-11 09:06:43,693][0m Trial 81 finished with value: 0.15725743041358703 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.61it/s]


** End of epoch, accumulated average loss = 4.648820 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 09:06:46,656][0m Trial 82 finished with value: 0.15108022359873094 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.95it/s]


** End of epoch, accumulated average loss = 4.404396 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 09:06:49,577][0m Trial 83 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.42it/s]


** End of epoch, accumulated average loss = 4.814585 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 09:06:52,510][0m Trial 84 finished with value: 0.1507386192342478 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.45it/s]


** End of epoch, accumulated average loss = 4.683886 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 09:06:56,933][0m Trial 85 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.74it/s]


** End of epoch, accumulated average loss = 4.495642 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 09:06:59,821][0m Trial 86 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.83it/s]


** End of epoch, accumulated average loss = 4.554868 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 09:07:02,732][0m Trial 87 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.58it/s]


** End of epoch, accumulated average loss = 4.571847 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 09:07:05,652][0m Trial 88 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.91it/s]


** End of epoch, accumulated average loss = 4.753626 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.93it/s]
[32m[I 2023-04-11 09:07:09,376][0m Trial 89 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.32it/s]


** End of epoch, accumulated average loss = 4.792140 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 09:07:12,318][0m Trial 90 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.51it/s]


** End of epoch, accumulated average loss = 4.499005 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 09:07:15,273][0m Trial 91 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.48it/s]


** End of epoch, accumulated average loss = 4.645803 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 09:07:18,344][0m Trial 92 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.20it/s]


** End of epoch, accumulated average loss = 5.066742 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.81it/s]
[32m[I 2023-04-11 09:07:22,073][0m Trial 93 finished with value: 0.15093200513168817 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.21it/s]


** End of epoch, accumulated average loss = 4.656147 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]
[32m[I 2023-04-11 09:07:25,010][0m Trial 94 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.89it/s]


** End of epoch, accumulated average loss = 4.763942 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 09:07:27,976][0m Trial 95 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.65it/s]


** End of epoch, accumulated average loss = 5.003864 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.39it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]
[32m[I 2023-04-11 09:07:30,987][0m Trial 96 finished with value: 0.15061375103546953 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.84it/s]


** End of epoch, accumulated average loss = 4.620388 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.45it/s]
[32m[I 2023-04-11 09:07:34,732][0m Trial 97 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.34it/s]


** End of epoch, accumulated average loss = 4.908880 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 09:07:37,765][0m Trial 98 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.22it/s]


** End of epoch, accumulated average loss = 4.886902 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 09:07:40,705][0m Trial 99 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.71it/s]


** End of epoch, accumulated average loss = 5.134807 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 09:07:43,637][0m Trial 100 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.97it/s]


** End of epoch, accumulated average loss = 4.719275 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.95it/s]
[32m[I 2023-04-11 09:07:47,427][0m Trial 101 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.61it/s]


** End of epoch, accumulated average loss = 4.604866 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 09:07:50,421][0m Trial 102 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.38it/s]


** End of epoch, accumulated average loss = 4.720040 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 09:07:53,408][0m Trial 103 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.79it/s]


** End of epoch, accumulated average loss = 5.062123 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 09:07:56,491][0m Trial 104 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.02it/s]


** End of epoch, accumulated average loss = 4.535514 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.17it/s]
[32m[I 2023-04-11 09:08:00,293][0m Trial 105 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.72it/s]


** End of epoch, accumulated average loss = 4.914655 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 09:08:03,237][0m Trial 106 finished with value: 0.15703517587939697 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.57it/s]


** End of epoch, accumulated average loss = 5.162020 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 09:08:06,218][0m Trial 107 finished with value: 0.15106881184379484 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.50it/s]


** End of epoch, accumulated average loss = 4.878673 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.98it/s]
[32m[I 2023-04-11 09:08:09,399][0m Trial 108 finished with value: 0.1605007623786213 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 14.80it/s]


** End of epoch, accumulated average loss = 4.929384 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:04<00:00,  2.39it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 09:08:15,744][0m Trial 109 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.28it/s]


** End of epoch, accumulated average loss = 4.779776 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.24it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 09:08:19,069][0m Trial 110 finished with value: 0.15063643895458312 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.70it/s]


** End of epoch, accumulated average loss = 4.707163 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.36it/s]
[32m[I 2023-04-11 09:08:23,875][0m Trial 111 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 13.15it/s]


** End of epoch, accumulated average loss = 4.605683 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 09:08:27,791][0m Trial 112 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.86it/s]


** End of epoch, accumulated average loss = 4.700139 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 09:08:30,746][0m Trial 113 finished with value: 0.15063643895458312 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.43it/s]


** End of epoch, accumulated average loss = 5.065355 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.45it/s]
[32m[I 2023-04-11 09:08:34,259][0m Trial 114 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.56it/s]


** End of epoch, accumulated average loss = 4.794370 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.91it/s]
[32m[I 2023-04-11 09:08:37,655][0m Trial 115 finished with value: 0.15072725902479464 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.59it/s]


** End of epoch, accumulated average loss = 4.835738 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 09:08:41,077][0m Trial 116 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.98it/s]


** End of epoch, accumulated average loss = 4.555467 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 09:08:44,129][0m Trial 117 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.40it/s]


** End of epoch, accumulated average loss = 4.700529 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 09:08:47,090][0m Trial 118 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.88it/s]


** End of epoch, accumulated average loss = 4.627534 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.89it/s]
[32m[I 2023-04-11 09:08:50,493][0m Trial 119 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.79it/s]


** End of epoch, accumulated average loss = 4.959046 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.37it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 09:08:53,906][0m Trial 120 finished with value: 0.15061375103546953 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.30it/s]


** End of epoch, accumulated average loss = 4.599183 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 09:08:56,859][0m Trial 121 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.26it/s]


** End of epoch, accumulated average loss = 4.576383 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 09:08:59,845][0m Trial 122 finished with value: 0.15109163707788772 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.36it/s]


** End of epoch, accumulated average loss = 4.488677 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.20it/s]
[32m[I 2023-04-11 09:09:03,134][0m Trial 123 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.08it/s]


** End of epoch, accumulated average loss = 5.056975 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.28it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]
[32m[I 2023-04-11 09:09:06,593][0m Trial 124 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.75it/s]


** End of epoch, accumulated average loss = 5.469575 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 09:09:09,546][0m Trial 125 finished with value: 0.15092061575611226 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.67it/s]


** End of epoch, accumulated average loss = 4.583999 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 09:09:12,500][0m Trial 126 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.42it/s]


** End of epoch, accumulated average loss = 5.105303 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.28it/s]
[32m[I 2023-04-11 09:09:15,817][0m Trial 127 finished with value: 0.15108022359873094 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.74it/s]


** End of epoch, accumulated average loss = 4.810796 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 09:09:19,339][0m Trial 128 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.90it/s]


** End of epoch, accumulated average loss = 4.831651 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.05it/s]
[32m[I 2023-04-11 09:09:22,313][0m Trial 129 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.85it/s]


** End of epoch, accumulated average loss = 4.704509 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 09:09:25,393][0m Trial 130 finished with value: 0.15379883112888343 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.00it/s]


** End of epoch, accumulated average loss = 4.690775 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.36it/s]
[32m[I 2023-04-11 09:09:28,681][0m Trial 131 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.55it/s]


** End of epoch, accumulated average loss = 4.633044 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]
[32m[I 2023-04-11 09:09:32,233][0m Trial 132 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.05it/s]


** End of epoch, accumulated average loss = 4.580923 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 09:09:35,196][0m Trial 133 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.77it/s]


** End of epoch, accumulated average loss = 4.870383 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 09:09:38,211][0m Trial 134 finished with value: 0.15142337976983647 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.67it/s]


** End of epoch, accumulated average loss = 4.672935 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.47it/s]
[32m[I 2023-04-11 09:09:41,423][0m Trial 135 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.92it/s]


** End of epoch, accumulated average loss = 4.539603 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 09:09:44,977][0m Trial 136 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.55it/s]


** End of epoch, accumulated average loss = 4.915511 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 09:09:47,996][0m Trial 137 finished with value: 0.15081818867355404 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.56it/s]


** End of epoch, accumulated average loss = 4.683905 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]
[32m[I 2023-04-11 09:09:51,045][0m Trial 138 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.48it/s]


** End of epoch, accumulated average loss = 4.662232 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.54it/s]
[32m[I 2023-04-11 09:09:54,256][0m Trial 139 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.00it/s]


** End of epoch, accumulated average loss = 4.437624 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.10it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 09:09:58,557][0m Trial 140 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.71it/s]


** End of epoch, accumulated average loss = 4.628448 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 09:10:01,544][0m Trial 141 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.68it/s]


** End of epoch, accumulated average loss = 4.600965 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]
[32m[I 2023-04-11 09:10:04,615][0m Trial 142 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.30it/s]


** End of epoch, accumulated average loss = 4.858600 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.74it/s]
[32m[I 2023-04-11 09:10:08,062][0m Trial 143 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.91it/s]


** End of epoch, accumulated average loss = 4.626467 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 09:10:11,458][0m Trial 144 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.33it/s]


** End of epoch, accumulated average loss = 4.914848 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 09:10:14,470][0m Trial 145 finished with value: 0.06775067750677508 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.81it/s]


** End of epoch, accumulated average loss = 5.228524 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.02it/s]
[32m[I 2023-04-11 09:10:17,451][0m Trial 146 finished with value: 0.15313935681470137 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.51it/s]


** End of epoch, accumulated average loss = 4.801999 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.75it/s]
[32m[I 2023-04-11 09:10:20,912][0m Trial 147 finished with value: 0.15138899402013473 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.85it/s]


** End of epoch, accumulated average loss = 4.908332 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 09:10:24,362][0m Trial 148 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.82it/s]


** End of epoch, accumulated average loss = 4.538890 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 09:10:27,392][0m Trial 149 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.49it/s]


** End of epoch, accumulated average loss = 4.738799 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 09:10:30,390][0m Trial 150 finished with value: 0.15212596029512437 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.57it/s]


** End of epoch, accumulated average loss = 4.591374 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.81it/s]
[32m[I 2023-04-11 09:10:33,855][0m Trial 151 finished with value: 0.15092061575611226 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.83it/s]


** End of epoch, accumulated average loss = 4.642120 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.41it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 09:10:37,329][0m Trial 152 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.91it/s]


** End of epoch, accumulated average loss = 4.616305 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 09:10:40,329][0m Trial 153 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.61it/s]


** End of epoch, accumulated average loss = 5.145761 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 09:10:43,353][0m Trial 154 finished with value: 0.15116015418335726 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.80it/s]


** End of epoch, accumulated average loss = 4.799451 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.85it/s]
[32m[I 2023-04-11 09:10:46,916][0m Trial 155 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.51it/s]


** End of epoch, accumulated average loss = 4.620042 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.40it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 09:10:50,331][0m Trial 156 finished with value: 0.15212596029512437 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.54it/s]


** End of epoch, accumulated average loss = 4.582671 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 09:10:53,374][0m Trial 157 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.33it/s]


** End of epoch, accumulated average loss = 4.616874 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 09:10:56,377][0m Trial 158 finished with value: 0.15114873035066506 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.75it/s]


** End of epoch, accumulated average loss = 4.865853 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.83it/s]
[32m[I 2023-04-11 09:10:59,803][0m Trial 159 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.35it/s]


** End of epoch, accumulated average loss = 4.889232 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 09:11:03,160][0m Trial 160 finished with value: 0.15150367396409362 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.51it/s]


** End of epoch, accumulated average loss = 4.571853 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 09:11:06,136][0m Trial 161 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.15it/s]


** End of epoch, accumulated average loss = 5.139318 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 09:11:09,176][0m Trial 162 finished with value: 0.07671358981243527 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.87it/s]


** End of epoch, accumulated average loss = 4.971667 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.97it/s]
[32m[I 2023-04-11 09:11:12,582][0m Trial 163 finished with value: 0.15074998115625235 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.10it/s]


** End of epoch, accumulated average loss = 4.722640 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.36it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 09:11:16,080][0m Trial 164 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.60it/s]


** End of epoch, accumulated average loss = 4.796686 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 09:11:19,083][0m Trial 165 finished with value: 0.15159554309103312 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.87it/s]


** End of epoch, accumulated average loss = 4.596185 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 09:11:22,101][0m Trial 166 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.55it/s]


** End of epoch, accumulated average loss = 5.021120 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.76it/s]
[32m[I 2023-04-11 09:11:25,703][0m Trial 167 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.67it/s]


** End of epoch, accumulated average loss = 4.571987 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 09:11:29,109][0m Trial 168 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.59it/s]


** End of epoch, accumulated average loss = 4.705516 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 09:11:32,115][0m Trial 169 finished with value: 0.15229972586049345 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.49it/s]


** End of epoch, accumulated average loss = 4.949311 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 09:11:35,061][0m Trial 170 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.55it/s]


** End of epoch, accumulated average loss = 4.789357 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.78it/s]
[32m[I 2023-04-11 09:11:38,491][0m Trial 171 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.78it/s]


** End of epoch, accumulated average loss = 4.745484 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 09:11:41,886][0m Trial 172 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.42it/s]


** End of epoch, accumulated average loss = 4.763095 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 09:11:44,892][0m Trial 173 finished with value: 0.15313935681470137 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.16it/s]


** End of epoch, accumulated average loss = 5.070804 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 09:11:47,906][0m Trial 174 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.15it/s]


** End of epoch, accumulated average loss = 5.020308 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.73it/s]
[32m[I 2023-04-11 09:11:51,400][0m Trial 175 finished with value: 0.08377314233056882 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.46it/s]


** End of epoch, accumulated average loss = 4.593420 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.39it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]
[32m[I 2023-04-11 09:11:54,877][0m Trial 176 finished with value: 0.15463120457708365 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.62it/s]


** End of epoch, accumulated average loss = 5.097923 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 09:11:57,911][0m Trial 177 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.28it/s]


** End of epoch, accumulated average loss = 5.102946 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 09:12:00,882][0m Trial 178 finished with value: 0.15331544653123802 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.88it/s]


** End of epoch, accumulated average loss = 4.794766 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.90it/s]
[32m[I 2023-04-11 09:12:04,276][0m Trial 179 finished with value: 0.15087507543753773 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.09it/s]


** End of epoch, accumulated average loss = 4.878756 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 09:12:07,865][0m Trial 180 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.91it/s]


** End of epoch, accumulated average loss = 4.830536 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 09:12:10,928][0m Trial 181 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.80it/s]


** End of epoch, accumulated average loss = 4.778218 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 09:12:13,898][0m Trial 182 finished with value: 0.15080681646810434 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.59it/s]


** End of epoch, accumulated average loss = 4.610972 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.37it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.84it/s]
[32m[I 2023-04-11 09:12:17,401][0m Trial 183 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.53it/s]


** End of epoch, accumulated average loss = 4.808741 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 09:12:20,758][0m Trial 184 finished with value: 0.1516185277840952 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.75it/s]


** End of epoch, accumulated average loss = 4.835340 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 09:12:23,813][0m Trial 185 finished with value: 0.15263680073265665 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.88it/s]


** End of epoch, accumulated average loss = 4.858251 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 09:12:26,865][0m Trial 186 finished with value: 0.15422578655151142 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.36it/s]


** End of epoch, accumulated average loss = 4.662311 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.60it/s]
[32m[I 2023-04-11 09:12:30,389][0m Trial 187 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.30it/s]


** End of epoch, accumulated average loss = 4.792577 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 09:12:33,752][0m Trial 188 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.72it/s]


** End of epoch, accumulated average loss = 4.702756 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 09:12:36,751][0m Trial 189 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.82it/s]


** End of epoch, accumulated average loss = 4.865907 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 09:12:39,687][0m Trial 190 finished with value: 0.1524274064476793 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.93it/s]


** End of epoch, accumulated average loss = 4.622990 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.88it/s]
[32m[I 2023-04-11 09:12:43,098][0m Trial 191 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 16.62it/s]


** End of epoch, accumulated average loss = 4.609211 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 09:12:46,646][0m Trial 192 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.40it/s]


** End of epoch, accumulated average loss = 4.638991 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 09:12:49,660][0m Trial 193 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.87it/s]


** End of epoch, accumulated average loss = 5.138034 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 09:12:52,670][0m Trial 194 finished with value: 0.1513317191283293 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.39it/s]


** End of epoch, accumulated average loss = 4.615769 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.84it/s]
[32m[I 2023-04-11 09:12:56,134][0m Trial 195 finished with value: 0.15620118712902217 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.36it/s]


** End of epoch, accumulated average loss = 4.923128 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 09:12:59,557][0m Trial 196 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.69it/s]


** End of epoch, accumulated average loss = 4.613772 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 09:13:02,554][0m Trial 197 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.11it/s]


** End of epoch, accumulated average loss = 4.656331 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 09:13:05,539][0m Trial 198 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.82it/s]


** End of epoch, accumulated average loss = 5.007249 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.78it/s]
[32m[I 2023-04-11 09:13:09,004][0m Trial 199 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.50it/s]


** End of epoch, accumulated average loss = 4.672507 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.33it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]
[32m[I 2023-04-11 09:13:12,391][0m Trial 200 finished with value: 0.15119443604475355 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.58it/s]


** End of epoch, accumulated average loss = 4.684112 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 09:13:15,387][0m Trial 201 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.30it/s]


** End of epoch, accumulated average loss = 4.473149 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 09:13:18,443][0m Trial 202 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.19it/s]


** End of epoch, accumulated average loss = 4.954015 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.96it/s]
[32m[I 2023-04-11 09:13:21,843][0m Trial 203 finished with value: 0.151894888736994 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.31it/s]


** End of epoch, accumulated average loss = 5.019451 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.11it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 09:13:25,382][0m Trial 204 finished with value: 0.15239256324291373 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.91it/s]


** End of epoch, accumulated average loss = 4.644874 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 09:13:28,550][0m Trial 205 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.90it/s]


** End of epoch, accumulated average loss = 4.550395 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 09:13:31,590][0m Trial 206 finished with value: 0.15087507543753773 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.03it/s]


** End of epoch, accumulated average loss = 4.866281 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.56it/s]
[32m[I 2023-04-11 09:13:35,117][0m Trial 207 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.88it/s]


** End of epoch, accumulated average loss = 4.863305 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 09:13:38,515][0m Trial 208 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.54it/s]


** End of epoch, accumulated average loss = 4.680204 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 09:13:41,511][0m Trial 209 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.21it/s]


** End of epoch, accumulated average loss = 4.605575 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]
[32m[I 2023-04-11 09:13:44,454][0m Trial 210 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.67it/s]


** End of epoch, accumulated average loss = 4.659053 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.06it/s]
[32m[I 2023-04-11 09:13:47,791][0m Trial 211 finished with value: 0.15074998115625235 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.97it/s]


** End of epoch, accumulated average loss = 4.759634 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.19it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 09:13:51,245][0m Trial 212 finished with value: 0.15137753557372086 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.30it/s]


** End of epoch, accumulated average loss = 4.763301 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 09:13:54,211][0m Trial 213 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.11it/s]


** End of epoch, accumulated average loss = 4.625751 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]
[32m[I 2023-04-11 09:13:57,153][0m Trial 214 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.25it/s]


** End of epoch, accumulated average loss = 4.741157 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.28it/s]
[32m[I 2023-04-11 09:14:00,436][0m Trial 215 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.44it/s]


** End of epoch, accumulated average loss = 4.704880 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.10it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.08it/s]
[32m[I 2023-04-11 09:14:03,930][0m Trial 216 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.26it/s]


** End of epoch, accumulated average loss = 5.074500 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 09:14:06,966][0m Trial 217 finished with value: 0.15667841754798276 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.83it/s]


** End of epoch, accumulated average loss = 4.954639 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 09:14:09,967][0m Trial 218 finished with value: 0.15772870662460567 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.93it/s]


** End of epoch, accumulated average loss = 4.866294 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.57it/s]
[32m[I 2023-04-11 09:14:13,204][0m Trial 219 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.18it/s]


** End of epoch, accumulated average loss = 4.704022 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.99it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 09:14:16,725][0m Trial 220 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.43it/s]


** End of epoch, accumulated average loss = 4.781996 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 09:14:19,706][0m Trial 221 finished with value: 0.152473888846535 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.37it/s]


** End of epoch, accumulated average loss = 4.466516 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 09:14:22,722][0m Trial 222 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.91it/s]


** End of epoch, accumulated average loss = 4.577863 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.35it/s]
[32m[I 2023-04-11 09:14:25,990][0m Trial 223 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.36it/s]


** End of epoch, accumulated average loss = 5.314586 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 09:14:29,612][0m Trial 224 finished with value: 0.15113730824454016 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.53it/s]


** End of epoch, accumulated average loss = 4.881494 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 09:14:32,658][0m Trial 225 finished with value: 0.15329194450831607 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.39it/s]


** End of epoch, accumulated average loss = 5.391885 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 09:14:35,654][0m Trial 226 finished with value: 0.15815277558121144 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.31it/s]


** End of epoch, accumulated average loss = 4.980876 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.56it/s]
[32m[I 2023-04-11 09:14:38,927][0m Trial 227 finished with value: 0.16093988895147662 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.34it/s]


** End of epoch, accumulated average loss = 4.652149 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 09:14:42,550][0m Trial 228 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.69it/s]


** End of epoch, accumulated average loss = 4.688700 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 09:14:45,537][0m Trial 229 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 47 with value: 0.16578249336870027.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.54it/s]


** End of epoch, accumulated average loss = 4.891190 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 09:14:48,678][0m Trial 230 finished with value: 0.1671541997492687 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.07it/s]


** End of epoch, accumulated average loss = 4.847878 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.29it/s]
[32m[I 2023-04-11 09:14:51,993][0m Trial 231 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.40it/s]


** End of epoch, accumulated average loss = 4.784580 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]
[32m[I 2023-04-11 09:14:55,648][0m Trial 232 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.57it/s]


** End of epoch, accumulated average loss = 4.507661 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.36it/s]
[32m[I 2023-04-11 09:14:58,734][0m Trial 233 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.08it/s]


** End of epoch, accumulated average loss = 5.239136 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 09:15:01,720][0m Trial 234 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.56it/s]


** End of epoch, accumulated average loss = 4.624329 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.47it/s]
[32m[I 2023-04-11 09:15:04,946][0m Trial 235 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.25it/s]


** End of epoch, accumulated average loss = 4.769665 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]
[32m[I 2023-04-11 09:15:08,449][0m Trial 236 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.71it/s]


** End of epoch, accumulated average loss = 4.711817 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 09:15:11,431][0m Trial 237 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.18it/s]


** End of epoch, accumulated average loss = 4.529193 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.10it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.05it/s]
[32m[I 2023-04-11 09:15:14,374][0m Trial 238 finished with value: 0.15074998115625235 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.12it/s]


** End of epoch, accumulated average loss = 4.520906 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.97it/s]
[32m[I 2023-04-11 09:15:17,518][0m Trial 239 finished with value: 0.15871756209824617 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.59it/s]


** End of epoch, accumulated average loss = 4.790053 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 09:15:21,154][0m Trial 240 finished with value: 0.15214910612400154 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.05it/s]


** End of epoch, accumulated average loss = 4.658654 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]
[32m[I 2023-04-11 09:15:24,113][0m Trial 241 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.08it/s]


** End of epoch, accumulated average loss = 4.628860 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 09:15:27,098][0m Trial 242 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.57it/s]


** End of epoch, accumulated average loss = 4.641767 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.79it/s]
[32m[I 2023-04-11 09:15:30,484][0m Trial 243 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.54it/s]


** End of epoch, accumulated average loss = 4.715069 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]
[32m[I 2023-04-11 09:15:34,041][0m Trial 244 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.57it/s]


** End of epoch, accumulated average loss = 4.689186 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 09:15:37,013][0m Trial 245 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.45it/s]


** End of epoch, accumulated average loss = 4.643341 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]
[32m[I 2023-04-11 09:15:39,984][0m Trial 246 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.19it/s]


** End of epoch, accumulated average loss = 4.605932 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.26it/s]
[32m[I 2023-04-11 09:15:43,049][0m Trial 247 finished with value: 0.16258840744654907 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.22it/s]


** End of epoch, accumulated average loss = 4.881249 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.39it/s]
[32m[I 2023-04-11 09:15:46,717][0m Trial 248 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.83it/s]


** End of epoch, accumulated average loss = 4.999638 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 09:15:49,695][0m Trial 249 finished with value: 0.15067048365225252 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.27it/s]


** End of epoch, accumulated average loss = 4.562885 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 09:15:52,691][0m Trial 250 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.37it/s]


** End of epoch, accumulated average loss = 4.635583 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]
[32m[I 2023-04-11 09:15:55,678][0m Trial 251 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.28it/s]


** End of epoch, accumulated average loss = 5.193529 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.68it/s]
[32m[I 2023-04-11 09:15:59,613][0m Trial 252 finished with value: 0.1516530178950561 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.94it/s]


** End of epoch, accumulated average loss = 4.471822 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]
[32m[I 2023-04-11 09:16:02,566][0m Trial 253 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.73it/s]


** End of epoch, accumulated average loss = 4.862555 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 09:16:05,561][0m Trial 254 finished with value: 0.15109163707788772 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.29it/s]


** End of epoch, accumulated average loss = 4.602079 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 09:16:08,560][0m Trial 255 finished with value: 0.15086369465188204 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.61it/s]


** End of epoch, accumulated average loss = 4.837974 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.53it/s]
[32m[I 2023-04-11 09:16:12,418][0m Trial 256 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.65it/s]


** End of epoch, accumulated average loss = 5.014893 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 09:16:15,531][0m Trial 257 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.59it/s]


** End of epoch, accumulated average loss = 4.598543 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 09:16:18,575][0m Trial 258 finished with value: 0.15063643895458312 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.23it/s]


** End of epoch, accumulated average loss = 4.639233 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]
[32m[I 2023-04-11 09:16:21,622][0m Trial 259 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.63it/s]


** End of epoch, accumulated average loss = 5.074945 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.47it/s]
[32m[I 2023-04-11 09:16:25,493][0m Trial 260 finished with value: 0.15322148165172755 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.37it/s]


** End of epoch, accumulated average loss = 4.774098 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 09:16:28,499][0m Trial 261 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.04it/s]


** End of epoch, accumulated average loss = 4.834357 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 09:16:31,531][0m Trial 262 finished with value: 0.1520103367028958 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.35it/s]


** End of epoch, accumulated average loss = 5.244179 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 09:16:34,548][0m Trial 263 finished with value: 0.1507386192342478 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.59it/s]


** End of epoch, accumulated average loss = 4.856412 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.42it/s]
[32m[I 2023-04-11 09:16:38,345][0m Trial 264 finished with value: 0.15915963711602737 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.62it/s]


** End of epoch, accumulated average loss = 4.709452 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 09:16:41,320][0m Trial 265 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.83it/s]


** End of epoch, accumulated average loss = 4.620113 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 09:16:44,296][0m Trial 266 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.02it/s]


** End of epoch, accumulated average loss = 4.554987 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 09:16:47,314][0m Trial 267 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 26.01it/s]


** End of epoch, accumulated average loss = 4.630942 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.31it/s]
[32m[I 2023-04-11 09:16:51,155][0m Trial 268 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.71it/s]


** End of epoch, accumulated average loss = 4.526207 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 09:16:54,266][0m Trial 269 finished with value: 0.1524390243902439 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.17it/s]


** End of epoch, accumulated average loss = 4.787261 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 09:16:57,273][0m Trial 270 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.30it/s]


** End of epoch, accumulated average loss = 4.552862 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 09:17:00,321][0m Trial 271 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.64it/s]


** End of epoch, accumulated average loss = 5.116983 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.54it/s]
[32m[I 2023-04-11 09:17:04,115][0m Trial 272 finished with value: 0.15167602002123465 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.67it/s]


** End of epoch, accumulated average loss = 4.849487 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 09:17:07,139][0m Trial 273 finished with value: 0.15067048365225252 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.96it/s]


** End of epoch, accumulated average loss = 4.696567 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 09:17:10,168][0m Trial 274 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.56it/s]


** End of epoch, accumulated average loss = 4.703106 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 09:17:13,152][0m Trial 275 finished with value: 0.15060240963855423 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 27.93it/s]


** End of epoch, accumulated average loss = 4.707211 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.11it/s]
[32m[I 2023-04-11 09:17:17,031][0m Trial 276 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.64it/s]


** End of epoch, accumulated average loss = 4.809147 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 09:17:20,100][0m Trial 277 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.32it/s]


** End of epoch, accumulated average loss = 4.416898 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 09:17:23,148][0m Trial 278 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.84it/s]


** End of epoch, accumulated average loss = 4.858300 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 09:17:26,180][0m Trial 279 finished with value: 0.15064778547755348 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.43it/s]


** End of epoch, accumulated average loss = 4.500624 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.18it/s]
[32m[I 2023-04-11 09:17:30,116][0m Trial 280 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.25it/s]


** End of epoch, accumulated average loss = 5.148076 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 09:17:33,291][0m Trial 281 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.03it/s]


** End of epoch, accumulated average loss = 5.225747 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 09:17:36,335][0m Trial 282 finished with value: 0.1512401693889897 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.41it/s]


** End of epoch, accumulated average loss = 4.431474 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 09:17:39,368][0m Trial 283 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.97it/s]


** End of epoch, accumulated average loss = 4.776067 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.82it/s]
[32m[I 2023-04-11 09:17:43,146][0m Trial 284 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.84it/s]


** End of epoch, accumulated average loss = 4.627137 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 09:17:46,120][0m Trial 285 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.27it/s]


** End of epoch, accumulated average loss = 4.530087 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 09:17:49,142][0m Trial 286 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.68it/s]


** End of epoch, accumulated average loss = 4.437111 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 09:17:52,142][0m Trial 287 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.39it/s]


** End of epoch, accumulated average loss = 4.874468 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.37it/s]
[32m[I 2023-04-11 09:17:55,933][0m Trial 288 finished with value: 0.15070454374199382 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.83it/s]


** End of epoch, accumulated average loss = 4.758613 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 09:17:58,960][0m Trial 289 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.49it/s]


** End of epoch, accumulated average loss = 4.924444 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 09:18:01,974][0m Trial 290 finished with value: 0.15119443604475355 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 27.54it/s]


** End of epoch, accumulated average loss = 4.838963 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 09:18:05,011][0m Trial 291 finished with value: 0.15940065354267952 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.67it/s]


** End of epoch, accumulated average loss = 4.804115 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.39it/s]
[32m[I 2023-04-11 09:18:08,882][0m Trial 292 finished with value: 0.15080681646810434 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.38it/s]


** End of epoch, accumulated average loss = 4.773935 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 09:18:11,938][0m Trial 293 finished with value: 0.1626412946247052 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.61it/s]


** End of epoch, accumulated average loss = 4.743043 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 09:18:15,009][0m Trial 294 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.60it/s]


** End of epoch, accumulated average loss = 4.695464 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]
[32m[I 2023-04-11 09:18:18,083][0m Trial 295 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.57it/s]


** End of epoch, accumulated average loss = 4.559713 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.36it/s]
[32m[I 2023-04-11 09:18:21,931][0m Trial 296 finished with value: 0.15720798616569723 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.70it/s]


** End of epoch, accumulated average loss = 4.618099 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 09:18:24,991][0m Trial 297 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.29it/s]


** End of epoch, accumulated average loss = 5.135930 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 09:18:28,023][0m Trial 298 finished with value: 0.15305731996632738 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.84it/s]


** End of epoch, accumulated average loss = 4.639486 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 09:18:31,054][0m Trial 299 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.40it/s]


** End of epoch, accumulated average loss = 4.818182 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.06it/s]
[32m[I 2023-04-11 09:18:34,940][0m Trial 300 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 230 with value: 0.1671541997492687.[0m


Best trial:
  Score: 0.167
  Params:
    dropout_embedding: 0.2
    dropout_attention: 0.15
    dropout_residual: 0.15
    dropout_relu: 0.1
    lr_peak: 0.001


#### Smoothing with alpha = 0.1

In [None]:
import optuna
from sklearn.model_selection import train_test_split

# define the hyper-parameter space to search
def objective(trial):
    part2ixy = load_dataset(TRANSLIT_PATH, parts=SCORED_PARTS1)
    train_ids, train_strings, train_transliterations = part2ixy['train_small']
    val_ids, val_strings, val_transliterations = part2ixy['dev_small']
    dropout = {
            'embedding': trial.suggest_categorical('dropout_embedding', [0.1, 0.15, 0.2]),
            'attention': trial.suggest_categorical('dropout_attention', [0.1, 0.15, 0.2]),
            'residual': trial.suggest_categorical('dropout_residual', [0.1, 0.15, 0.2]),
            'relu': trial.suggest_categorical('dropout_relu', [0.1, 0.15, 0.2])
        }
    train_config = {
        'batch_size': 200, 'n_epochs': 1, 
        'lr_scheduler': {
        'type': 'warmup,decay_linear',
        'warmup_steps_part': 0.1,
            'lr_peak': trial.suggest_categorical('lr_peak', [3e-4, 5e-4, 1e-3, 2e-3]),
        },
    }
    
    # train the model with the current hyper-parameters
    learnable_params = train(train_strings, train_transliterations, 1, 0.1, train_config, dropout)
    for part, (ids, x, y) in part2ixy.items():
    # evaluate the predicted strings using the compute_metrics function
      preds = classify(y, learnable_params)
      metric_values = compute_metrics(np.squeeze(preds), y, ['mean_ld@1'])
    return 1/ metric_values['mean_ld@1'] 

    save_preds(allpreds, preds_fname=PREDS_FNAME)
    print('\nChecking saved predictions ...')
    score_preds(preds_path=PREDS_FNAME, data_dir=TRANSLIT_PATH, parts=SCORED_PARTS)
# run the hyper-parameter search with Optuna
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=301)

# print the best hyper-parameter values and the corresponding objective score
print('Best trial:')
trial = study.best_trial
print(f'  Score: {trial.value:.3f}')
print('  Params:')
for key, value in trial.params.items():
    print(f'    {key}: {value}')

[32m[I 2023-04-11 08:11:15,756][0m A new study created in memory with name: no-name-8ab2fc86-dee3-411f-a02d-62416305e8ad[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.76it/s]


** End of epoch, accumulated average loss = 4.697294 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.30it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 08:11:19,306][0m Trial 0 finished with value: 0.15369246138476908 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 0 with value: 0.15369246138476908.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.63it/s]


** End of epoch, accumulated average loss = 4.752085 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 08:11:22,226][0m Trial 1 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 0 with value: 0.15369246138476908.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.72it/s]


** End of epoch, accumulated average loss = 5.225473 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.64it/s]
[32m[I 2023-04-11 08:11:25,711][0m Trial 2 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 0 with value: 0.15369246138476908.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 17.86it/s]


** End of epoch, accumulated average loss = 4.835422 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.29it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.64it/s]
[32m[I 2023-04-11 08:11:29,762][0m Trial 3 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 0 with value: 0.15369246138476908.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 28.14it/s]


** End of epoch, accumulated average loss = 4.972817 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.29it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 08:11:33,094][0m Trial 4 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 0 with value: 0.15369246138476908.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.00it/s]


** End of epoch, accumulated average loss = 4.898606 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 08:11:35,982][0m Trial 5 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 0 with value: 0.15369246138476908.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.10it/s]


** End of epoch, accumulated average loss = 4.873797 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.09it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 08:11:38,881][0m Trial 6 finished with value: 0.16 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.60it/s]


** End of epoch, accumulated average loss = 5.329139 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.69it/s]
[32m[I 2023-04-11 08:11:42,506][0m Trial 7 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 27.98it/s]


** End of epoch, accumulated average loss = 4.703127 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]
[32m[I 2023-04-11 08:11:45,476][0m Trial 8 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.63it/s]


** End of epoch, accumulated average loss = 4.914104 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 08:11:48,379][0m Trial 9 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.33it/s]


** End of epoch, accumulated average loss = 4.842958 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]
[32m[I 2023-04-11 08:11:51,302][0m Trial 10 finished with value: 0.15071590052750566 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.31it/s]


** End of epoch, accumulated average loss = 4.851084 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.54it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:03<00:00,  3.30it/s]
[32m[I 2023-04-11 08:11:56,255][0m Trial 11 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.61it/s]


** End of epoch, accumulated average loss = 4.891805 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 08:11:59,302][0m Trial 12 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.75it/s]


** End of epoch, accumulated average loss = 4.735146 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 08:12:02,221][0m Trial 13 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.68it/s]


** End of epoch, accumulated average loss = 4.984745 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 08:12:05,144][0m Trial 14 finished with value: 0.15061375103546953 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.93it/s]


** End of epoch, accumulated average loss = 4.827131 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.32it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.54it/s]
[32m[I 2023-04-11 08:12:08,913][0m Trial 15 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.57it/s]


** End of epoch, accumulated average loss = 4.683455 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.71it/s]
[32m[I 2023-04-11 08:12:13,438][0m Trial 16 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.06it/s]


** End of epoch, accumulated average loss = 5.023102 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.17it/s]
[32m[I 2023-04-11 08:12:16,498][0m Trial 17 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 17.80it/s]


** End of epoch, accumulated average loss = 4.697222 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.72it/s]
[32m[I 2023-04-11 08:12:20,502][0m Trial 18 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.81it/s]


** End of epoch, accumulated average loss = 5.056110 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]
[32m[I 2023-04-11 08:12:24,506][0m Trial 19 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.00it/s]


** End of epoch, accumulated average loss = 4.579907 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:12:27,470][0m Trial 20 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.56it/s]


** End of epoch, accumulated average loss = 4.963427 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.32it/s]
[32m[I 2023-04-11 08:12:30,717][0m Trial 21 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 18.11it/s]


** End of epoch, accumulated average loss = 4.832728 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.30it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.89it/s]
[32m[I 2023-04-11 08:12:36,316][0m Trial 22 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.88it/s]


** End of epoch, accumulated average loss = 4.650901 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:12:39,312][0m Trial 23 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.34it/s]


** End of epoch, accumulated average loss = 5.212561 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.37it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:12:42,459][0m Trial 24 finished with value: 0.15120586678763134 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.98it/s]


** End of epoch, accumulated average loss = 4.644656 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.91it/s]
[32m[I 2023-04-11 08:12:45,638][0m Trial 25 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.71it/s]


** End of epoch, accumulated average loss = 4.835888 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:12:49,272][0m Trial 26 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.26it/s]


** End of epoch, accumulated average loss = 4.847625 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 08:12:52,197][0m Trial 27 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.06it/s]


** End of epoch, accumulated average loss = 5.081867 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 08:12:55,212][0m Trial 28 finished with value: 0.15888147442008263 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.93it/s]


** End of epoch, accumulated average loss = 5.141620 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.91it/s]
[32m[I 2023-04-11 08:12:58,338][0m Trial 29 finished with value: 0.15703517587939697 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.90it/s]


** End of epoch, accumulated average loss = 5.126941 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]
[32m[I 2023-04-11 08:13:01,970][0m Trial 30 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.71it/s]


** End of epoch, accumulated average loss = 5.130130 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 08:13:04,941][0m Trial 31 finished with value: 0.15067048365225252 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.11it/s]


** End of epoch, accumulated average loss = 4.818717 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 08:13:07,892][0m Trial 32 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.17it/s]


** End of epoch, accumulated average loss = 4.786397 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.95it/s]
[32m[I 2023-04-11 08:13:11,017][0m Trial 33 finished with value: 0.1516300227445034 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.39it/s]


** End of epoch, accumulated average loss = 4.712946 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:13:14,709][0m Trial 34 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.10it/s]


** End of epoch, accumulated average loss = 5.014355 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 08:13:17,667][0m Trial 35 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.74it/s]


** End of epoch, accumulated average loss = 5.085236 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:13:20,602][0m Trial 36 finished with value: 0.1525087692542321 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.13it/s]


** End of epoch, accumulated average loss = 5.379504 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:13:23,677][0m Trial 37 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.05it/s]


** End of epoch, accumulated average loss = 4.822081 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.83it/s]
[32m[I 2023-04-11 08:13:28,221][0m Trial 38 finished with value: 0.15064778547755348 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.30it/s]


** End of epoch, accumulated average loss = 4.946087 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:13:31,203][0m Trial 39 finished with value: 0.15878056525881232 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.93it/s]


** End of epoch, accumulated average loss = 4.634927 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 08:13:34,205][0m Trial 40 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.36it/s]


** End of epoch, accumulated average loss = 4.748255 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.06it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.53it/s]
[32m[I 2023-04-11 08:13:37,374][0m Trial 41 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.57it/s]


** End of epoch, accumulated average loss = 4.761234 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]
[32m[I 2023-04-11 08:13:40,886][0m Trial 42 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.75it/s]


** End of epoch, accumulated average loss = 4.980975 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]
[32m[I 2023-04-11 08:13:43,767][0m Trial 43 finished with value: 0.15101177891875567 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.03it/s]


** End of epoch, accumulated average loss = 5.081206 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 08:13:46,747][0m Trial 44 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.21it/s]


** End of epoch, accumulated average loss = 4.962058 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.00it/s]
[32m[I 2023-04-11 08:13:49,847][0m Trial 45 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.78it/s]


** End of epoch, accumulated average loss = 4.704555 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.38it/s]
[32m[I 2023-04-11 08:13:53,544][0m Trial 46 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.16it/s]


** End of epoch, accumulated average loss = 4.884913 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 08:13:57,132][0m Trial 47 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 35.26it/s]


** End of epoch, accumulated average loss = 4.687315 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 08:14:00,036][0m Trial 48 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 35.15it/s]


** End of epoch, accumulated average loss = 4.914588 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.41it/s]
[32m[I 2023-04-11 08:14:04,613][0m Trial 49 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 11.44it/s]


** End of epoch, accumulated average loss = 4.655740 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.14it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 08:14:08,297][0m Trial 50 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 35.17it/s]


** End of epoch, accumulated average loss = 4.878881 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]
[32m[I 2023-04-11 08:14:11,186][0m Trial 51 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.76it/s]


** End of epoch, accumulated average loss = 4.887288 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.11it/s]
[32m[I 2023-04-11 08:14:14,078][0m Trial 52 finished with value: 0.15074998115625235 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.97it/s]


** End of epoch, accumulated average loss = 4.932083 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.41it/s]
[32m[I 2023-04-11 08:14:18,278][0m Trial 53 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.64it/s]


** End of epoch, accumulated average loss = 4.764856 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 08:14:21,381][0m Trial 54 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.99it/s]


** End of epoch, accumulated average loss = 4.628045 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.15it/s]
[32m[I 2023-04-11 08:14:24,239][0m Trial 55 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.39it/s]


** End of epoch, accumulated average loss = 4.745330 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.09it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:14:27,770][0m Trial 56 finished with value: 0.15063643895458312 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.46it/s]


** End of epoch, accumulated average loss = 4.863659 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.40it/s]
[32m[I 2023-04-11 08:14:31,564][0m Trial 57 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.04it/s]


** End of epoch, accumulated average loss = 4.894388 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:14:34,487][0m Trial 58 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.58it/s]


** End of epoch, accumulated average loss = 4.739843 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 08:14:37,454][0m Trial 59 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.06it/s]


** End of epoch, accumulated average loss = 4.754284 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:14:40,440][0m Trial 60 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.24it/s]


** End of epoch, accumulated average loss = 4.957761 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.18it/s]
[32m[I 2023-04-11 08:14:44,274][0m Trial 61 finished with value: 0.15460729746444032 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.63it/s]


** End of epoch, accumulated average loss = 4.936531 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:14:47,231][0m Trial 62 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.01it/s]


** End of epoch, accumulated average loss = 4.841125 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:14:50,231][0m Trial 63 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.39it/s]


** End of epoch, accumulated average loss = 4.845870 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 08:14:53,312][0m Trial 64 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 28.79it/s]


** End of epoch, accumulated average loss = 4.839818 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.09it/s]
[32m[I 2023-04-11 08:14:57,110][0m Trial 65 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.31it/s]


** End of epoch, accumulated average loss = 5.106478 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.06it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.68it/s]
[32m[I 2023-04-11 08:15:00,652][0m Trial 66 finished with value: 0.15065913370998116 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.03it/s]


** End of epoch, accumulated average loss = 4.930083 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 08:15:03,592][0m Trial 67 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.42it/s]


** End of epoch, accumulated average loss = 4.684851 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 08:15:06,559][0m Trial 68 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.10it/s]


** End of epoch, accumulated average loss = 5.082620 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.53it/s]
[32m[I 2023-04-11 08:15:10,473][0m Trial 69 finished with value: 0.15114873035066506 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.06it/s]


** End of epoch, accumulated average loss = 4.605558 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:15:13,457][0m Trial 70 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.78it/s]


** End of epoch, accumulated average loss = 4.735822 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 08:15:16,391][0m Trial 71 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.20it/s]


** End of epoch, accumulated average loss = 4.809413 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:15:19,347][0m Trial 72 finished with value: 0.1522301720200944 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.94it/s]


** End of epoch, accumulated average loss = 4.862636 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.61it/s]
[32m[I 2023-04-11 08:15:23,138][0m Trial 73 finished with value: 0.15097757982939533 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.60it/s]


** End of epoch, accumulated average loss = 5.263809 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.06it/s]
[32m[I 2023-04-11 08:15:26,786][0m Trial 74 finished with value: 0.15775358889414734 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.65it/s]


** End of epoch, accumulated average loss = 4.975752 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.52it/s]
[32m[I 2023-04-11 08:15:30,579][0m Trial 75 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.16it/s]


** End of epoch, accumulated average loss = 4.763741 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.70it/s]
[32m[I 2023-04-11 08:15:34,157][0m Trial 76 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.84it/s]


** End of epoch, accumulated average loss = 4.802751 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.38it/s]
[32m[I 2023-04-11 08:15:38,098][0m Trial 77 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.69it/s]


** End of epoch, accumulated average loss = 4.724948 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:15:41,090][0m Trial 78 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.36it/s]


** End of epoch, accumulated average loss = 4.718278 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 08:15:44,031][0m Trial 79 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.57it/s]


** End of epoch, accumulated average loss = 4.887388 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.55it/s]
[32m[I 2023-04-11 08:15:47,734][0m Trial 80 finished with value: 0.15150367396409362 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.33it/s]


** End of epoch, accumulated average loss = 5.172561 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:15:50,872][0m Trial 81 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.52it/s]


** End of epoch, accumulated average loss = 5.046199 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.12it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.09it/s]
[32m[I 2023-04-11 08:15:53,732][0m Trial 82 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.35it/s]


** End of epoch, accumulated average loss = 4.733790 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 08:15:56,710][0m Trial 83 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.05it/s]


** End of epoch, accumulated average loss = 4.755777 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.01it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.72it/s]
[32m[I 2023-04-11 08:16:00,279][0m Trial 84 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 14.43it/s]


** End of epoch, accumulated average loss = 5.110977 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.13it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 08:16:04,018][0m Trial 85 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.79it/s]


** End of epoch, accumulated average loss = 4.943086 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:16:06,971][0m Trial 86 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.82it/s]


** End of epoch, accumulated average loss = 5.140656 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]
[32m[I 2023-04-11 08:16:10,010][0m Trial 87 finished with value: 0.15884361845762848 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.97it/s]


** End of epoch, accumulated average loss = 5.260133 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.08it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.42it/s]
[32m[I 2023-04-11 08:16:13,889][0m Trial 88 finished with value: 0.1514807240778611 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.38it/s]


** End of epoch, accumulated average loss = 5.431018 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 08:16:17,005][0m Trial 89 finished with value: 0.1508409382306358 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.67it/s]


** End of epoch, accumulated average loss = 5.020133 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.33it/s]
[32m[I 2023-04-11 08:16:20,081][0m Trial 90 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.70it/s]


** End of epoch, accumulated average loss = 4.924117 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.20it/s]
[32m[I 2023-04-11 08:16:23,173][0m Trial 91 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.40it/s]


** End of epoch, accumulated average loss = 4.767116 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.17it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.49it/s]
[32m[I 2023-04-11 08:16:27,018][0m Trial 92 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.59it/s]


** End of epoch, accumulated average loss = 5.095846 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 08:16:30,008][0m Trial 93 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.38it/s]


** End of epoch, accumulated average loss = 4.779464 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:16:32,945][0m Trial 94 finished with value: 0.1512401693889897 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.43it/s]


** End of epoch, accumulated average loss = 5.017185 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:16:35,949][0m Trial 95 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.13it/s]


** End of epoch, accumulated average loss = 4.844791 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.29it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.50it/s]
[32m[I 2023-04-11 08:16:39,773][0m Trial 96 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.54it/s]


** End of epoch, accumulated average loss = 4.647059 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 08:16:42,810][0m Trial 97 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.58it/s]


** End of epoch, accumulated average loss = 5.160325 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 08:16:45,770][0m Trial 98 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.18it/s]


** End of epoch, accumulated average loss = 5.210948 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.60it/s]
[32m[I 2023-04-11 08:16:48,974][0m Trial 99 finished with value: 0.15072725902479464 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 12.88it/s]


** End of epoch, accumulated average loss = 4.985841 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.93it/s]
[32m[I 2023-04-11 08:16:53,936][0m Trial 100 finished with value: 0.15090922809929827 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.63it/s]


** End of epoch, accumulated average loss = 4.892057 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 08:16:57,077][0m Trial 101 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.93it/s]


** End of epoch, accumulated average loss = 4.920659 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 08:17:00,042][0m Trial 102 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.59it/s]


** End of epoch, accumulated average loss = 5.039761 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:17:03,015][0m Trial 103 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.92it/s]


** End of epoch, accumulated average loss = 5.627387 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.90it/s]
[32m[I 2023-04-11 08:17:06,827][0m Trial 104 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.27it/s]


** End of epoch, accumulated average loss = 4.877744 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:17:09,853][0m Trial 105 finished with value: 0.15076134479119555 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.08it/s]


** End of epoch, accumulated average loss = 4.730682 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:17:12,862][0m Trial 106 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.20it/s]


** End of epoch, accumulated average loss = 4.898003 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:17:15,871][0m Trial 107 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.59it/s]


** End of epoch, accumulated average loss = 4.644327 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.79it/s]
[32m[I 2023-04-11 08:17:19,709][0m Trial 108 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.34it/s]


** End of epoch, accumulated average loss = 4.764283 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:17:22,747][0m Trial 109 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.15it/s]


** End of epoch, accumulated average loss = 5.012957 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 08:17:25,796][0m Trial 110 finished with value: 0.15067048365225252 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 6 with value: 0.16.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.33it/s]


** End of epoch, accumulated average loss = 5.591583 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.35it/s]
[32m[I 2023-04-11 08:17:28,867][0m Trial 111 finished with value: 0.16446015952635476 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.19it/s]


** End of epoch, accumulated average loss = 4.811277 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.84it/s]
[32m[I 2023-04-11 08:17:32,718][0m Trial 112 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.19it/s]


** End of epoch, accumulated average loss = 5.047249 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 08:17:35,864][0m Trial 113 finished with value: 0.15814027041986242 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.61it/s]


** End of epoch, accumulated average loss = 4.681606 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]
[32m[I 2023-04-11 08:17:38,878][0m Trial 114 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.43it/s]


** End of epoch, accumulated average loss = 4.779328 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:17:41,891][0m Trial 115 finished with value: 0.15065913370998116 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.65it/s]


** End of epoch, accumulated average loss = 5.142775 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.12it/s]
[32m[I 2023-04-11 08:17:45,664][0m Trial 116 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.91it/s]


** End of epoch, accumulated average loss = 4.787619 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 08:17:48,648][0m Trial 117 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.24it/s]


** End of epoch, accumulated average loss = 4.666565 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 08:17:51,664][0m Trial 118 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.33it/s]


** End of epoch, accumulated average loss = 4.791489 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 08:17:54,585][0m Trial 119 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.06it/s]


** End of epoch, accumulated average loss = 4.744607 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.79it/s]
[32m[I 2023-04-11 08:17:58,485][0m Trial 120 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.62it/s]


** End of epoch, accumulated average loss = 4.747124 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]
[32m[I 2023-04-11 08:18:01,544][0m Trial 121 finished with value: 0.15097757982939533 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.68it/s]


** End of epoch, accumulated average loss = 4.989400 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 08:18:04,538][0m Trial 122 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.73it/s]


** End of epoch, accumulated average loss = 5.069829 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]
[32m[I 2023-04-11 08:18:07,612][0m Trial 123 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.96it/s]


** End of epoch, accumulated average loss = 4.910316 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.89it/s]
[32m[I 2023-04-11 08:18:11,537][0m Trial 124 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.03it/s]


** End of epoch, accumulated average loss = 4.742822 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 08:18:14,608][0m Trial 125 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.45it/s]


** End of epoch, accumulated average loss = 4.664876 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:18:17,613][0m Trial 126 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.09it/s]


** End of epoch, accumulated average loss = 4.718542 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]
[32m[I 2023-04-11 08:18:20,616][0m Trial 127 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.26it/s]


** End of epoch, accumulated average loss = 4.886499 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.88it/s]
[32m[I 2023-04-11 08:18:24,476][0m Trial 128 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.62it/s]


** End of epoch, accumulated average loss = 5.354021 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:18:27,489][0m Trial 129 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.90it/s]


** End of epoch, accumulated average loss = 4.958768 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:18:30,485][0m Trial 130 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.14it/s]


** End of epoch, accumulated average loss = 4.798007 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:18:33,475][0m Trial 131 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.00it/s]


** End of epoch, accumulated average loss = 4.910594 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.60it/s]
[32m[I 2023-04-11 08:18:37,325][0m Trial 132 finished with value: 0.16294606485253382 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.93it/s]


** End of epoch, accumulated average loss = 4.673464 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:18:40,299][0m Trial 133 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.41it/s]


** End of epoch, accumulated average loss = 4.891408 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]
[32m[I 2023-04-11 08:18:43,260][0m Trial 134 finished with value: 0.1513660788617271 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.96it/s]


** End of epoch, accumulated average loss = 4.977012 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]
[32m[I 2023-04-11 08:18:46,277][0m Trial 135 finished with value: 0.1507386192342478 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.13it/s]


** End of epoch, accumulated average loss = 4.951130 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.35it/s]
[32m[I 2023-04-11 08:18:50,030][0m Trial 136 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.25it/s]


** End of epoch, accumulated average loss = 4.813629 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:18:53,045][0m Trial 137 finished with value: 0.15060240963855423 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.84it/s]


** End of epoch, accumulated average loss = 4.780009 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:18:56,171][0m Trial 138 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.02it/s]


** End of epoch, accumulated average loss = 4.951676 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 08:18:59,162][0m Trial 139 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.16it/s]


** End of epoch, accumulated average loss = 5.632247 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.27it/s]
[32m[I 2023-04-11 08:19:02,990][0m Trial 140 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.07it/s]


** End of epoch, accumulated average loss = 5.276992 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 08:19:05,981][0m Trial 141 finished with value: 0.15150367396409362 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.19it/s]


** End of epoch, accumulated average loss = 4.611150 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:19:08,961][0m Trial 142 finished with value: 0.152473888846535 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.37it/s]


** End of epoch, accumulated average loss = 4.995517 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:19:12,011][0m Trial 143 finished with value: 0.15144631228229594 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.85it/s]


** End of epoch, accumulated average loss = 4.693986 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.16it/s]
[32m[I 2023-04-11 08:19:15,862][0m Trial 144 finished with value: 0.1516530178950561 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.76it/s]


** End of epoch, accumulated average loss = 4.854495 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 08:19:18,846][0m Trial 145 finished with value: 0.15106881184379484 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.17it/s]


** End of epoch, accumulated average loss = 4.737835 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]
[32m[I 2023-04-11 08:19:21,831][0m Trial 146 finished with value: 0.15152663080536405 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 28.55it/s]


** End of epoch, accumulated average loss = 5.003299 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 08:19:24,819][0m Trial 147 finished with value: 0.15120586678763134 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.80it/s]


** End of epoch, accumulated average loss = 4.637560 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.27it/s]
[32m[I 2023-04-11 08:19:28,563][0m Trial 148 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.05it/s]


** End of epoch, accumulated average loss = 5.187629 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 08:19:31,498][0m Trial 149 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.00it/s]


** End of epoch, accumulated average loss = 5.083476 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:19:34,551][0m Trial 150 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.09it/s]


** End of epoch, accumulated average loss = 4.824022 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 08:19:37,532][0m Trial 151 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.15it/s]


** End of epoch, accumulated average loss = 4.875300 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.90it/s]
[32m[I 2023-04-11 08:19:41,361][0m Trial 152 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.05it/s]


** End of epoch, accumulated average loss = 4.736991 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 08:19:44,328][0m Trial 153 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.20it/s]


** End of epoch, accumulated average loss = 4.860497 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:19:47,272][0m Trial 154 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.23it/s]


** End of epoch, accumulated average loss = 4.682363 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 08:19:50,282][0m Trial 155 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.49it/s]


** End of epoch, accumulated average loss = 4.870478 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.18it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.56it/s]
[32m[I 2023-04-11 08:19:54,128][0m Trial 156 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.38it/s]


** End of epoch, accumulated average loss = 5.116563 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 08:19:57,154][0m Trial 157 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.13it/s]


** End of epoch, accumulated average loss = 4.842563 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.27it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]
[32m[I 2023-04-11 08:20:00,315][0m Trial 158 finished with value: 0.15061375103546953 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.96it/s]


** End of epoch, accumulated average loss = 4.906031 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:20:03,399][0m Trial 159 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.84it/s]


** End of epoch, accumulated average loss = 4.822415 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.58it/s]
[32m[I 2023-04-11 08:20:07,351][0m Trial 160 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.72it/s]


** End of epoch, accumulated average loss = 4.750317 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]
[32m[I 2023-04-11 08:20:10,469][0m Trial 161 finished with value: 0.15071590052750566 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.03it/s]


** End of epoch, accumulated average loss = 4.715088 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 08:20:13,527][0m Trial 162 finished with value: 0.15142337976983647 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.05it/s]


** End of epoch, accumulated average loss = 4.713706 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:20:16,733][0m Trial 163 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.94it/s]


** End of epoch, accumulated average loss = 4.873720 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.02it/s]
[32m[I 2023-04-11 08:20:21,490][0m Trial 164 finished with value: 0.1541307028360049 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.74it/s]


** End of epoch, accumulated average loss = 4.915939 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.33it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:20:24,611][0m Trial 165 finished with value: 0.15071590052750566 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.32it/s]


** End of epoch, accumulated average loss = 5.137596 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.37it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.34it/s]
[32m[I 2023-04-11 08:20:27,760][0m Trial 166 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.91it/s]


** End of epoch, accumulated average loss = 4.626863 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.62it/s]
[32m[I 2023-04-11 08:20:31,024][0m Trial 167 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.55it/s]


** End of epoch, accumulated average loss = 4.797415 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.36it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:20:34,792][0m Trial 168 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.27it/s]


** End of epoch, accumulated average loss = 4.772807 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:20:37,852][0m Trial 169 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.11it/s]


** End of epoch, accumulated average loss = 5.159946 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:20:40,826][0m Trial 170 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.06it/s]


** End of epoch, accumulated average loss = 4.746491 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.13it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.51it/s]
[32m[I 2023-04-11 08:20:46,509][0m Trial 171 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.56it/s]


** End of epoch, accumulated average loss = 4.913268 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.26it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.11it/s]
[32m[I 2023-04-11 08:20:50,274][0m Trial 172 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 15.63it/s]


** End of epoch, accumulated average loss = 4.670097 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.06it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.35it/s]
[32m[I 2023-04-11 08:20:53,902][0m Trial 173 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.44it/s]


** End of epoch, accumulated average loss = 4.890592 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.50it/s]
[32m[I 2023-04-11 08:20:57,191][0m Trial 174 finished with value: 0.1563232765358762 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.64it/s]


** End of epoch, accumulated average loss = 4.687829 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:21:00,887][0m Trial 175 finished with value: 0.15137753557372086 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.98it/s]


** End of epoch, accumulated average loss = 4.616262 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 08:21:04,863][0m Trial 176 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.68it/s]


** End of epoch, accumulated average loss = 4.763275 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 08:21:08,056][0m Trial 177 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.00it/s]


** End of epoch, accumulated average loss = 4.768795 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.03it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.74it/s]
[32m[I 2023-04-11 08:21:12,004][0m Trial 178 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 10.05it/s]


** End of epoch, accumulated average loss = 5.415158 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:21:16,105][0m Trial 179 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.06it/s]


** End of epoch, accumulated average loss = 5.230503 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 08:21:19,152][0m Trial 180 finished with value: 0.15100037750094375 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.67it/s]


** End of epoch, accumulated average loss = 4.812031 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.73it/s]
[32m[I 2023-04-11 08:21:22,998][0m Trial 181 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.22it/s]


** End of epoch, accumulated average loss = 4.591934 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 08:21:27,762][0m Trial 182 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.00it/s]


** End of epoch, accumulated average loss = 4.814715 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:21:30,836][0m Trial 183 finished with value: 0.15137753557372086 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.27it/s]


** End of epoch, accumulated average loss = 4.909172 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 08:21:33,893][0m Trial 184 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.54it/s]


** End of epoch, accumulated average loss = 4.820120 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.26it/s]
[32m[I 2023-04-11 08:21:37,230][0m Trial 185 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.91it/s]


** End of epoch, accumulated average loss = 4.794899 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]
[32m[I 2023-04-11 08:21:40,952][0m Trial 186 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.85it/s]


** End of epoch, accumulated average loss = 4.791835 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 08:21:43,960][0m Trial 187 finished with value: 0.15643332029722332 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.20it/s]


** End of epoch, accumulated average loss = 4.895674 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]
[32m[I 2023-04-11 08:21:47,041][0m Trial 188 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.80it/s]


** End of epoch, accumulated average loss = 4.844971 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.03it/s]
[32m[I 2023-04-11 08:21:50,453][0m Trial 189 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 15.78it/s]


** End of epoch, accumulated average loss = 4.626607 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.12it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 08:21:54,226][0m Trial 190 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.79it/s]


** End of epoch, accumulated average loss = 5.005041 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:21:57,312][0m Trial 191 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.09it/s]


** End of epoch, accumulated average loss = 4.819814 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]
[32m[I 2023-04-11 08:22:00,411][0m Trial 192 finished with value: 0.15348016268897244 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.68it/s]


** End of epoch, accumulated average loss = 4.907296 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.65it/s]
[32m[I 2023-04-11 08:22:04,920][0m Trial 193 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 13.73it/s]


** End of epoch, accumulated average loss = 4.674386 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.25it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.86it/s]
[32m[I 2023-04-11 08:22:09,739][0m Trial 194 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.47it/s]


** End of epoch, accumulated average loss = 4.816774 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.79it/s]
[32m[I 2023-04-11 08:22:13,239][0m Trial 195 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 15.92it/s]


** End of epoch, accumulated average loss = 5.679745 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.03it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.54it/s]
[32m[I 2023-04-11 08:22:17,500][0m Trial 196 finished with value: 0.15106881184379484 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.94it/s]


** End of epoch, accumulated average loss = 4.901088 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 08:22:20,895][0m Trial 197 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.43it/s]


** End of epoch, accumulated average loss = 4.926349 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.30it/s]
[32m[I 2023-04-11 08:22:24,623][0m Trial 198 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.31it/s]


** End of epoch, accumulated average loss = 4.714893 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:22:27,731][0m Trial 199 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.17it/s]


** End of epoch, accumulated average loss = 4.937532 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:03<00:00,  2.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.81it/s]
[32m[I 2023-04-11 08:22:34,204][0m Trial 200 finished with value: 0.15064778547755348 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.34it/s]


** End of epoch, accumulated average loss = 4.645785 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:22:37,228][0m Trial 201 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 27.80it/s]


** End of epoch, accumulated average loss = 4.896237 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.02it/s]
[32m[I 2023-04-11 08:22:41,209][0m Trial 202 finished with value: 0.1512401693889897 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 15.67it/s]


** End of epoch, accumulated average loss = 4.743885 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.08it/s]
[32m[I 2023-04-11 08:22:45,788][0m Trial 203 finished with value: 0.1507386192342478 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.37it/s]


** End of epoch, accumulated average loss = 4.935995 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 08:22:48,827][0m Trial 204 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.91it/s]


** End of epoch, accumulated average loss = 4.814398 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.90it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.02it/s]
[32m[I 2023-04-11 08:22:53,389][0m Trial 205 finished with value: 0.15176809834572771 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.78it/s]


** End of epoch, accumulated average loss = 5.149815 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:03<00:00,  2.68it/s]
[32m[I 2023-04-11 08:22:59,998][0m Trial 206 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.03it/s]


** End of epoch, accumulated average loss = 4.847766 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.38it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 08:23:04,100][0m Trial 207 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.06it/s]


** End of epoch, accumulated average loss = 4.964129 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:23:07,159][0m Trial 208 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.26it/s]


** End of epoch, accumulated average loss = 4.661549 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.49it/s]
[32m[I 2023-04-11 08:23:10,877][0m Trial 209 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.99it/s]


** End of epoch, accumulated average loss = 4.716397 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.25it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:23:14,168][0m Trial 210 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.05it/s]


** End of epoch, accumulated average loss = 4.791606 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 08:23:17,212][0m Trial 211 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.86it/s]


** End of epoch, accumulated average loss = 4.927136 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.34it/s]
[32m[I 2023-04-11 08:23:20,352][0m Trial 212 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.27it/s]


** End of epoch, accumulated average loss = 4.776275 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.59it/s]
[32m[I 2023-04-11 08:23:24,021][0m Trial 213 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.42it/s]


** End of epoch, accumulated average loss = 4.741637 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 08:23:27,413][0m Trial 214 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.26it/s]


** End of epoch, accumulated average loss = 4.943541 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 08:23:30,494][0m Trial 215 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.95it/s]


** End of epoch, accumulated average loss = 4.809137 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 08:23:33,583][0m Trial 216 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.95it/s]


** End of epoch, accumulated average loss = 5.238991 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.31it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.62it/s]
[32m[I 2023-04-11 08:23:37,397][0m Trial 217 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 18.09it/s]


** End of epoch, accumulated average loss = 4.812843 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.36it/s]
[32m[I 2023-04-11 08:23:40,799][0m Trial 218 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.94it/s]


** End of epoch, accumulated average loss = 4.606580 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 08:23:43,876][0m Trial 219 finished with value: 0.15078407720144754 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.86it/s]


** End of epoch, accumulated average loss = 4.820399 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:23:46,944][0m Trial 220 finished with value: 0.15676438313215238 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.40it/s]


** End of epoch, accumulated average loss = 4.596438 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.44it/s]
[32m[I 2023-04-11 08:23:50,884][0m Trial 221 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.23it/s]


** End of epoch, accumulated average loss = 4.691072 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:23:53,914][0m Trial 222 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.46it/s]


** End of epoch, accumulated average loss = 4.783414 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:23:57,011][0m Trial 223 finished with value: 0.15378700499807765 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.03it/s]


** End of epoch, accumulated average loss = 4.945715 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]
[32m[I 2023-04-11 08:24:00,091][0m Trial 224 finished with value: 0.15074998115625235 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.19it/s]


** End of epoch, accumulated average loss = 4.945680 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.07it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.84it/s]
[32m[I 2023-04-11 08:24:05,581][0m Trial 225 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.72it/s]


** End of epoch, accumulated average loss = 4.852026 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]
[32m[I 2023-04-11 08:24:08,663][0m Trial 226 finished with value: 0.15919764387487065 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.54it/s]


** End of epoch, accumulated average loss = 4.884229 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.21it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 08:24:11,793][0m Trial 227 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.37it/s]


** End of epoch, accumulated average loss = 5.075240 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.15it/s]
[32m[I 2023-04-11 08:24:15,171][0m Trial 228 finished with value: 0.1516990291262136 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.77it/s]


** End of epoch, accumulated average loss = 4.674360 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.39it/s]
[32m[I 2023-04-11 08:24:19,373][0m Trial 229 finished with value: 0.15114873035066506 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 14.53it/s]


** End of epoch, accumulated average loss = 4.971761 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:24:23,026][0m Trial 230 finished with value: 0.15202189115232592 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.39it/s]


** End of epoch, accumulated average loss = 5.347185 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]
[32m[I 2023-04-11 08:24:26,256][0m Trial 231 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.14it/s]


** End of epoch, accumulated average loss = 4.955905 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.14it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.28it/s]
[32m[I 2023-04-11 08:24:30,239][0m Trial 232 finished with value: 0.15158405335758676 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 28.00it/s]


** End of epoch, accumulated average loss = 4.823522 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 08:24:33,420][0m Trial 233 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.74it/s]


** End of epoch, accumulated average loss = 4.978606 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:24:36,541][0m Trial 234 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.59it/s]


** End of epoch, accumulated average loss = 4.697098 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.39it/s]
[32m[I 2023-04-11 08:24:39,665][0m Trial 235 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.68it/s]


** End of epoch, accumulated average loss = 4.711150 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.56it/s]
[32m[I 2023-04-11 08:24:43,611][0m Trial 236 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.49it/s]


** End of epoch, accumulated average loss = 4.852110 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]
[32m[I 2023-04-11 08:24:46,737][0m Trial 237 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.67it/s]


** End of epoch, accumulated average loss = 5.047739 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:24:49,823][0m Trial 238 finished with value: 0.15220700152207 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.21it/s]


** End of epoch, accumulated average loss = 5.179256 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:24:52,904][0m Trial 239 finished with value: 0.1528350909368791 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.49it/s]


** End of epoch, accumulated average loss = 5.000399 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.41it/s]
[32m[I 2023-04-11 08:24:56,915][0m Trial 240 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.70it/s]


** End of epoch, accumulated average loss = 4.770328 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.31it/s]
[32m[I 2023-04-11 08:25:00,073][0m Trial 241 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.59it/s]


** End of epoch, accumulated average loss = 5.082066 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:25:03,189][0m Trial 242 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.33it/s]


** End of epoch, accumulated average loss = 4.866188 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]
[32m[I 2023-04-11 08:25:06,274][0m Trial 243 finished with value: 0.15065913370998116 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 18.40it/s]


** End of epoch, accumulated average loss = 4.771888 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.13it/s]
[32m[I 2023-04-11 08:25:10,411][0m Trial 244 finished with value: 0.15228812914033352 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.92it/s]


** End of epoch, accumulated average loss = 5.059781 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 08:25:13,577][0m Trial 245 finished with value: 0.1606425702811245 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.73it/s]


** End of epoch, accumulated average loss = 4.710857 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]
[32m[I 2023-04-11 08:25:16,694][0m Trial 246 finished with value: 0.15179113539769276 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.95it/s]


** End of epoch, accumulated average loss = 4.810725 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.09it/s]
[32m[I 2023-04-11 08:25:19,892][0m Trial 247 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.71it/s]


** End of epoch, accumulated average loss = 5.209361 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.36it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.90it/s]
[32m[I 2023-04-11 08:25:23,876][0m Trial 248 finished with value: 0.15097757982939533 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.42it/s]


** End of epoch, accumulated average loss = 4.950963 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]
[32m[I 2023-04-11 08:25:26,998][0m Trial 249 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.88it/s]


** End of epoch, accumulated average loss = 4.769195 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]
[32m[I 2023-04-11 08:25:30,108][0m Trial 250 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.28it/s]


** End of epoch, accumulated average loss = 4.786921 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.53it/s]
[32m[I 2023-04-11 08:25:33,438][0m Trial 251 finished with value: 0.15082956259426847 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.16it/s]


** End of epoch, accumulated average loss = 4.692297 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.29it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.33it/s]
[32m[I 2023-04-11 08:25:37,276][0m Trial 252 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.48it/s]


** End of epoch, accumulated average loss = 4.795100 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:25:40,355][0m Trial 253 finished with value: 0.1522301720200944 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 28.97it/s]


** End of epoch, accumulated average loss = 4.930878 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 08:25:43,463][0m Trial 254 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.78it/s]


** End of epoch, accumulated average loss = 4.756360 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.71it/s]
[32m[I 2023-04-11 08:25:46,694][0m Trial 255 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.45it/s]


** End of epoch, accumulated average loss = 4.907161 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.32it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:25:50,475][0m Trial 256 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.11it/s]


** End of epoch, accumulated average loss = 4.717931 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:25:53,649][0m Trial 257 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.28it/s]


** End of epoch, accumulated average loss = 4.700508 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 08:25:56,688][0m Trial 258 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.71it/s]


** End of epoch, accumulated average loss = 5.062416 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.25it/s]
[32m[I 2023-04-11 08:26:00,032][0m Trial 259 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.51it/s]


** End of epoch, accumulated average loss = 5.078601 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 08:26:03,772][0m Trial 260 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.73it/s]


** End of epoch, accumulated average loss = 4.843792 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.81it/s]
[32m[I 2023-04-11 08:26:07,216][0m Trial 261 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.92it/s]


** End of epoch, accumulated average loss = 4.842930 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.33it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 08:26:10,345][0m Trial 262 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.46it/s]


** End of epoch, accumulated average loss = 5.025598 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.61it/s]
[32m[I 2023-04-11 08:26:13,932][0m Trial 263 finished with value: 0.15118300703000984 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.83it/s]


** End of epoch, accumulated average loss = 5.269073 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:26:17,401][0m Trial 264 finished with value: 0.15172204521316948 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.75it/s]


** End of epoch, accumulated average loss = 5.278537 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.35it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]
[32m[I 2023-04-11 08:26:20,560][0m Trial 265 finished with value: 0.15080681646810434 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.15it/s]


** End of epoch, accumulated average loss = 4.785493 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]
[32m[I 2023-04-11 08:26:23,614][0m Trial 266 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.58it/s]


** End of epoch, accumulated average loss = 4.771408 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.50it/s]
[32m[I 2023-04-11 08:26:27,333][0m Trial 267 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.14it/s]


** End of epoch, accumulated average loss = 5.173033 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.99it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]
[32m[I 2023-04-11 08:26:30,796][0m Trial 268 finished with value: 0.150681835304754 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 28.39it/s]


** End of epoch, accumulated average loss = 4.817702 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]
[32m[I 2023-04-11 08:26:33,967][0m Trial 269 finished with value: 0.15150367396409362 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.42it/s]


** End of epoch, accumulated average loss = 4.899633 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:26:37,165][0m Trial 270 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.93it/s]


** End of epoch, accumulated average loss = 4.803029 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.18it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.50it/s]
[32m[I 2023-04-11 08:26:41,032][0m Trial 271 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.24it/s]


** End of epoch, accumulated average loss = 4.767547 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:26:44,256][0m Trial 272 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.79it/s]


** End of epoch, accumulated average loss = 4.812882 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:26:47,384][0m Trial 273 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.07it/s]


** End of epoch, accumulated average loss = 5.030587 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.29it/s]
[32m[I 2023-04-11 08:26:50,565][0m Trial 274 finished with value: 0.15395273650989147 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.59it/s]


** End of epoch, accumulated average loss = 4.683435 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.07it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.42it/s]
[32m[I 2023-04-11 08:26:54,494][0m Trial 275 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.51it/s]


** End of epoch, accumulated average loss = 4.854695 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]
[32m[I 2023-04-11 08:26:57,672][0m Trial 276 finished with value: 0.15260186174271326 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.77it/s]


** End of epoch, accumulated average loss = 5.283432 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]
[32m[I 2023-04-11 08:27:00,801][0m Trial 277 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.13it/s]


** End of epoch, accumulated average loss = 5.004415 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 08:27:03,904][0m Trial 278 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.88it/s]


** End of epoch, accumulated average loss = 4.857567 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.47it/s]
[32m[I 2023-04-11 08:27:07,915][0m Trial 279 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.62it/s]


** End of epoch, accumulated average loss = 4.897020 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.00it/s]
[32m[I 2023-04-11 08:27:11,124][0m Trial 280 finished with value: 0.15072725902479464 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.22it/s]


** End of epoch, accumulated average loss = 5.028591 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 08:27:14,211][0m Trial 281 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.23it/s]


** End of epoch, accumulated average loss = 4.905503 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]
[32m[I 2023-04-11 08:27:17,462][0m Trial 282 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.28it/s]


** End of epoch, accumulated average loss = 4.794101 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.95it/s]
[32m[I 2023-04-11 08:27:21,487][0m Trial 283 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.96it/s]


** End of epoch, accumulated average loss = 4.603987 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]
[32m[I 2023-04-11 08:27:24,598][0m Trial 284 finished with value: 0.1508978421608571 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.83it/s]


** End of epoch, accumulated average loss = 4.714583 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.29it/s]
[32m[I 2023-04-11 08:27:27,759][0m Trial 285 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.62it/s]


** End of epoch, accumulated average loss = 4.877200 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.27it/s]
[32m[I 2023-04-11 08:27:30,926][0m Trial 286 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.25it/s]


** End of epoch, accumulated average loss = 4.602684 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.54it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.35it/s]
[32m[I 2023-04-11 08:27:34,922][0m Trial 287 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 111 with value: 0.16446015952635476.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.12it/s]


** End of epoch, accumulated average loss = 4.873339 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]
[32m[I 2023-04-11 08:27:38,091][0m Trial 288 finished with value: 0.16803898504453035 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.13it/s]


** End of epoch, accumulated average loss = 4.829214 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.34it/s]
[32m[I 2023-04-11 08:27:41,233][0m Trial 289 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.41it/s]


** End of epoch, accumulated average loss = 5.241743 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.21it/s]
[32m[I 2023-04-11 08:27:44,397][0m Trial 290 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.57it/s]


** End of epoch, accumulated average loss = 5.039265 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.57it/s]
[32m[I 2023-04-11 08:27:48,375][0m Trial 291 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.16it/s]


** End of epoch, accumulated average loss = 4.861839 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 08:27:51,496][0m Trial 292 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.18it/s]


** End of epoch, accumulated average loss = 5.007767 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]
[32m[I 2023-04-11 08:27:54,629][0m Trial 293 finished with value: 0.1561158379517602 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.05it/s]


** End of epoch, accumulated average loss = 4.691548 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.35it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.66it/s]
[32m[I 2023-04-11 08:27:57,945][0m Trial 294 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.76it/s]


** End of epoch, accumulated average loss = 5.241682 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.23it/s]
[32m[I 2023-04-11 08:28:01,811][0m Trial 295 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.81it/s]


** End of epoch, accumulated average loss = 4.903786 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.32it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.37it/s]
[32m[I 2023-04-11 08:28:04,988][0m Trial 296 finished with value: 0.1506931886678722 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.45it/s]


** End of epoch, accumulated average loss = 5.114142 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]
[32m[I 2023-04-11 08:28:08,224][0m Trial 297 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.46it/s]


** End of epoch, accumulated average loss = 4.865828 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.38it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.04it/s]
[32m[I 2023-04-11 08:28:11,687][0m Trial 298 finished with value: 0.1507386192342478 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.49it/s]


** End of epoch, accumulated average loss = 4.771639 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.54it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 08:28:15,478][0m Trial 299 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.53it/s]


** End of epoch, accumulated average loss = 4.771484 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]
[32m[I 2023-04-11 08:28:18,592][0m Trial 300 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 288 with value: 0.16803898504453035.[0m


Best trial:
  Score: 0.168
  Params:
    dropout_embedding: 0.15
    dropout_attention: 0.2
    dropout_residual: 0.1
    dropout_relu: 0.15
    lr_peak: 0.0003


#### Smoothing with alpha = 0.2

In [None]:
import optuna
from sklearn.model_selection import train_test_split

# define the hyper-parameter space to search
def objective(trial):
    part2ixy = load_dataset(TRANSLIT_PATH, parts=SCORED_PARTS1)
    train_ids, train_strings, train_transliterations = part2ixy['train_small']
    val_ids, val_strings, val_transliterations = part2ixy['dev_small']
    dropout = {
            'embedding': trial.suggest_categorical('dropout_embedding', [0.1, 0.15, 0.2]),
            'attention': trial.suggest_categorical('dropout_attention', [0.1, 0.15, 0.2]),
            'residual': trial.suggest_categorical('dropout_residual', [0.1, 0.15, 0.2]),
            'relu': trial.suggest_categorical('dropout_relu', [0.1, 0.15, 0.2])
        }
    train_config = {
        'batch_size': 200, 'n_epochs': 1, 
        'lr_scheduler': {
        'type': 'warmup,decay_linear',
        'warmup_steps_part': 0.1,
            'lr_peak': trial.suggest_categorical('lr_peak', [3e-4, 5e-4, 1e-3, 2e-3]),
        },
    }
    
    # train the model with the current hyper-parameters
    learnable_params = train(train_strings, train_transliterations, 1, 0.2, train_config, dropout)
    for part, (ids, x, y) in part2ixy.items():
    # evaluate the predicted strings using the compute_metrics function
      preds = classify(y, learnable_params)
      metric_values = compute_metrics(np.squeeze(preds), y, ['mean_ld@1'])
    return 1/ metric_values['mean_ld@1'] 

    save_preds(allpreds, preds_fname=PREDS_FNAME)
    print('\nChecking saved predictions ...')
    score_preds(preds_path=PREDS_FNAME, data_dir=TRANSLIT_PATH, parts=SCORED_PARTS)
# run the hyper-parameter search with Optuna
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=301)

# print the best hyper-parameter values and the corresponding objective score
print('Best trial:')
trial = study.best_trial
print(f'  Score: {trial.value:.3f}')
print('  Params:')
for key, value in trial.params.items():
    print(f'    {key}: {value}')

[32m[I 2023-04-11 08:28:18,633][0m A new study created in memory with name: no-name-f5150f99-71ed-4f60-b208-360f9769e853[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.89it/s]


** End of epoch, accumulated average loss = 5.037196 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:28:21,619][0m Trial 0 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 0 with value: 0.1505343971097396.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.40it/s]


** End of epoch, accumulated average loss = 5.063683 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.54it/s]
[32m[I 2023-04-11 08:28:24,796][0m Trial 1 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 0 with value: 0.1505343971097396.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.44it/s]


** End of epoch, accumulated average loss = 5.051986 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 08:28:28,384][0m Trial 2 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 0 with value: 0.1505343971097396.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.72it/s]


** End of epoch, accumulated average loss = 5.200973 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]
[32m[I 2023-04-11 08:28:31,293][0m Trial 3 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 0 with value: 0.1505343971097396.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.62it/s]


** End of epoch, accumulated average loss = 5.532892 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:28:34,262][0m Trial 4 finished with value: 0.15080681646810434 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 4 with value: 0.15080681646810434.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.18it/s]


** End of epoch, accumulated average loss = 4.873631 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.99it/s]
[32m[I 2023-04-11 08:28:37,340][0m Trial 5 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 4 with value: 0.15080681646810434.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.44it/s]


** End of epoch, accumulated average loss = 4.892796 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.20it/s]
[32m[I 2023-04-11 08:28:40,888][0m Trial 6 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 4 with value: 0.15080681646810434.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.67it/s]


** End of epoch, accumulated average loss = 5.019169 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.87it/s]
[32m[I 2023-04-11 08:28:43,803][0m Trial 7 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 4 with value: 0.15080681646810434.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.64it/s]


** End of epoch, accumulated average loss = 5.123816 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]
[32m[I 2023-04-11 08:28:46,688][0m Trial 8 finished with value: 0.16413623307345096 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.47it/s]


** End of epoch, accumulated average loss = 5.180103 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.05it/s]
[32m[I 2023-04-11 08:28:49,939][0m Trial 9 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.1, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.96it/s]


** End of epoch, accumulated average loss = 5.047159 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 08:28:53,585][0m Trial 10 finished with value: 0.15102318205844598 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.37it/s]


** End of epoch, accumulated average loss = 4.821285 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]
[32m[I 2023-04-11 08:28:56,521][0m Trial 11 finished with value: 0.15207968975743288 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.12it/s]


** End of epoch, accumulated average loss = 5.267531 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]
[32m[I 2023-04-11 08:28:59,438][0m Trial 12 finished with value: 0.15067048365225252 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.34it/s]


** End of epoch, accumulated average loss = 5.061116 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 08:29:02,371][0m Trial 13 finished with value: 0.1507386192342478 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.64it/s]


** End of epoch, accumulated average loss = 4.892008 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.03it/s]
[32m[I 2023-04-11 08:29:06,192][0m Trial 14 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.01it/s]


** End of epoch, accumulated average loss = 4.791042 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 08:29:09,084][0m Trial 15 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.47it/s]


** End of epoch, accumulated average loss = 4.988226 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:29:12,023][0m Trial 16 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.32it/s]


** End of epoch, accumulated average loss = 5.052860 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:29:14,932][0m Trial 17 finished with value: 0.16016657323616562 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 27.77it/s]


** End of epoch, accumulated average loss = 4.766080 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.59it/s]
[32m[I 2023-04-11 08:29:18,657][0m Trial 18 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.43it/s]


** End of epoch, accumulated average loss = 4.850325 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:29:21,573][0m Trial 19 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.24it/s]


** End of epoch, accumulated average loss = 4.926115 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.11it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.12it/s]
[32m[I 2023-04-11 08:29:24,412][0m Trial 20 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.26it/s]


** End of epoch, accumulated average loss = 4.825424 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]
[32m[I 2023-04-11 08:29:27,344][0m Trial 21 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.24it/s]


** End of epoch, accumulated average loss = 4.989572 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.03it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.90it/s]
[32m[I 2023-04-11 08:29:31,066][0m Trial 22 finished with value: 0.15560569516844316 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.85it/s]


** End of epoch, accumulated average loss = 5.255734 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.97it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]
[32m[I 2023-04-11 08:29:33,953][0m Trial 23 finished with value: 0.15446400988569664 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.48it/s]


** End of epoch, accumulated average loss = 5.070087 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]
[32m[I 2023-04-11 08:29:36,949][0m Trial 24 finished with value: 0.15417823003391923 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.04it/s]


** End of epoch, accumulated average loss = 5.069286 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 08:29:39,861][0m Trial 25 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.12it/s]


** End of epoch, accumulated average loss = 5.397709 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.12it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.73it/s]
[32m[I 2023-04-11 08:29:43,639][0m Trial 26 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.71it/s]


** End of epoch, accumulated average loss = 4.911180 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]
[32m[I 2023-04-11 08:29:46,568][0m Trial 27 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.97it/s]


** End of epoch, accumulated average loss = 4.986954 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.01it/s]
[32m[I 2023-04-11 08:29:49,437][0m Trial 28 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.72it/s]


** End of epoch, accumulated average loss = 5.099115 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.13it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.25it/s]
[32m[I 2023-04-11 08:29:52,248][0m Trial 29 finished with value: 0.15064778547755348 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.01it/s]


** End of epoch, accumulated average loss = 4.978308 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.04it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.63it/s]
[32m[I 2023-04-11 08:29:55,833][0m Trial 30 finished with value: 0.15074998115625235 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.64it/s]


** End of epoch, accumulated average loss = 5.007558 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 08:29:58,901][0m Trial 31 finished with value: 0.15146925174189638 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.38it/s]


** End of epoch, accumulated average loss = 4.966831 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]
[32m[I 2023-04-11 08:30:01,811][0m Trial 32 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.28it/s]


** End of epoch, accumulated average loss = 5.199093 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:30:04,755][0m Trial 33 finished with value: 0.15109163707788772 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.00it/s]


** End of epoch, accumulated average loss = 5.216252 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.62it/s]
[32m[I 2023-04-11 08:30:08,243][0m Trial 34 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.76it/s]


** End of epoch, accumulated average loss = 4.927081 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.95it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 08:30:11,589][0m Trial 35 finished with value: 0.15284677111196024 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.94it/s]


** End of epoch, accumulated average loss = 5.070985 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.92it/s]
[32m[I 2023-04-11 08:30:14,636][0m Trial 36 finished with value: 0.15063643895458312 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0003}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.22it/s]


** End of epoch, accumulated average loss = 5.498120 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 08:30:17,604][0m Trial 37 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.33it/s]


** End of epoch, accumulated average loss = 4.878402 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.62it/s]
[32m[I 2023-04-11 08:30:21,084][0m Trial 38 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.27it/s]


** End of epoch, accumulated average loss = 4.815858 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.13it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 08:30:24,371][0m Trial 39 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.03it/s]


** End of epoch, accumulated average loss = 4.830574 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:30:27,361][0m Trial 40 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.81it/s]


** End of epoch, accumulated average loss = 5.093333 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:30:30,341][0m Trial 41 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.88it/s]


** End of epoch, accumulated average loss = 5.294186 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.61it/s]
[32m[I 2023-04-11 08:30:33,820][0m Trial 42 finished with value: 0.1543805480509456 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.09it/s]


** End of epoch, accumulated average loss = 5.073009 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:30:37,150][0m Trial 43 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.24it/s]


** End of epoch, accumulated average loss = 4.832439 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.90it/s]
[32m[I 2023-04-11 08:30:40,127][0m Trial 44 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.64it/s]


** End of epoch, accumulated average loss = 4.807324 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]
[32m[I 2023-04-11 08:30:43,068][0m Trial 45 finished with value: 0.15060240963855423 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.79it/s]


** End of epoch, accumulated average loss = 4.967992 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.05it/s]
[32m[I 2023-04-11 08:30:46,379][0m Trial 46 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.59it/s]


** End of epoch, accumulated average loss = 4.929207 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:30:49,799][0m Trial 47 finished with value: 0.15234613040828762 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.89it/s]


** End of epoch, accumulated average loss = 4.982873 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.03it/s]
[32m[I 2023-04-11 08:30:52,713][0m Trial 48 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.10it/s]


** End of epoch, accumulated average loss = 5.223525 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.93it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]
[32m[I 2023-04-11 08:30:55,775][0m Trial 49 finished with value: 0.15063643895458312 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.70it/s]


** End of epoch, accumulated average loss = 5.060445 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.96it/s]
[32m[I 2023-04-11 08:30:59,114][0m Trial 50 finished with value: 0.15643332029722332 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.04it/s]


** End of epoch, accumulated average loss = 5.023979 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.32it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 08:31:02,542][0m Trial 51 finished with value: 0.1508978421608571 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.95it/s]


** End of epoch, accumulated average loss = 4.833285 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.12it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:31:05,451][0m Trial 52 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.72it/s]


** End of epoch, accumulated average loss = 4.770245 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  8.00it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:31:08,393][0m Trial 53 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.88it/s]


** End of epoch, accumulated average loss = 5.617255 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.86it/s]
[32m[I 2023-04-11 08:31:11,803][0m Trial 54 finished with value: 0.10886723640520386 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.64it/s]


** End of epoch, accumulated average loss = 5.061630 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.20it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.95it/s]
[32m[I 2023-04-11 08:31:15,243][0m Trial 55 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.49it/s]


** End of epoch, accumulated average loss = 4.924402 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.94it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 08:31:18,185][0m Trial 56 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.15it/s]


** End of epoch, accumulated average loss = 5.009561 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:31:21,185][0m Trial 57 finished with value: 0.16090104585679807 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.39it/s]


** End of epoch, accumulated average loss = 5.039637 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.26it/s]
[32m[I 2023-04-11 08:31:24,511][0m Trial 58 finished with value: 0.07175917620465717 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.67it/s]


** End of epoch, accumulated average loss = 5.185566 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.92it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:31:28,094][0m Trial 59 finished with value: 0.1520912547528517 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.82it/s]


** End of epoch, accumulated average loss = 4.958852 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 08:31:31,121][0m Trial 60 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.34it/s]


** End of epoch, accumulated average loss = 4.903965 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 08:31:34,096][0m Trial 61 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.11it/s]


** End of epoch, accumulated average loss = 5.085240 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.34it/s]
[32m[I 2023-04-11 08:31:37,475][0m Trial 62 finished with value: 0.15241579027587257 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.95it/s]


** End of epoch, accumulated average loss = 4.901701 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 08:31:41,033][0m Trial 63 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.18it/s]


** End of epoch, accumulated average loss = 4.847884 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:31:44,039][0m Trial 64 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.95it/s]


** End of epoch, accumulated average loss = 4.770421 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.99it/s]
[32m[I 2023-04-11 08:31:46,970][0m Trial 65 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 34.35it/s]


** End of epoch, accumulated average loss = 4.877422 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.46it/s]
[32m[I 2023-04-11 08:31:50,217][0m Trial 66 finished with value: 0.15190642564180465 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.68it/s]


** End of epoch, accumulated average loss = 5.088961 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]
[32m[I 2023-04-11 08:31:53,775][0m Trial 67 finished with value: 0.1538698261270965 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.41it/s]


** End of epoch, accumulated average loss = 5.012917 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:31:56,738][0m Trial 68 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.81it/s]


** End of epoch, accumulated average loss = 5.052479 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:31:59,701][0m Trial 69 finished with value: 0.15173355587588197 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.60it/s]


** End of epoch, accumulated average loss = 4.875139 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.62it/s]
[32m[I 2023-04-11 08:32:02,904][0m Trial 70 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.64it/s]


** End of epoch, accumulated average loss = 4.804093 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:32:06,514][0m Trial 71 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.57it/s]


** End of epoch, accumulated average loss = 4.772326 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:32:09,488][0m Trial 72 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.62it/s]


** End of epoch, accumulated average loss = 4.817432 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:32:12,520][0m Trial 73 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.05it/s]


** End of epoch, accumulated average loss = 4.707590 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.95it/s]
[32m[I 2023-04-11 08:32:15,647][0m Trial 74 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 16.09it/s]


** End of epoch, accumulated average loss = 4.994977 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:32:19,445][0m Trial 75 finished with value: 0.16079755587715067 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.02it/s]


** End of epoch, accumulated average loss = 4.757354 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 08:32:22,436][0m Trial 76 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.75it/s]


** End of epoch, accumulated average loss = 5.528111 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.96it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:32:25,397][0m Trial 77 finished with value: 0.15766653527788727 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.27it/s]


** End of epoch, accumulated average loss = 5.304409 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.65it/s]
[32m[I 2023-04-11 08:32:28,626][0m Trial 78 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.00it/s]


** End of epoch, accumulated average loss = 4.863080 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.54it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 08:32:32,329][0m Trial 79 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.50it/s]


** End of epoch, accumulated average loss = 5.173298 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]
[32m[I 2023-04-11 08:32:35,318][0m Trial 80 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.05it/s]


** End of epoch, accumulated average loss = 5.369400 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 08:32:38,319][0m Trial 81 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.00it/s]


** End of epoch, accumulated average loss = 5.072888 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.98it/s]
[32m[I 2023-04-11 08:32:41,437][0m Trial 82 finished with value: 0.15060240963855423 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.52it/s]


** End of epoch, accumulated average loss = 4.828266 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.98it/s]
[32m[I 2023-04-11 08:32:45,040][0m Trial 83 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.89it/s]


** End of epoch, accumulated average loss = 4.867214 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:32:48,004][0m Trial 84 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.12it/s]


** End of epoch, accumulated average loss = 5.511738 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:32:51,035][0m Trial 85 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.96it/s]


** End of epoch, accumulated average loss = 5.099661 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.39it/s]
[32m[I 2023-04-11 08:32:54,117][0m Trial 86 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.15it/s]


** End of epoch, accumulated average loss = 4.970158 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.14it/s]
[32m[I 2023-04-11 08:32:57,851][0m Trial 87 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.50it/s]


** End of epoch, accumulated average loss = 4.854762 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:33:00,842][0m Trial 88 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.49it/s]


** End of epoch, accumulated average loss = 4.986553 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:33:03,993][0m Trial 89 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.55it/s]


** End of epoch, accumulated average loss = 5.184594 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.31it/s]
[32m[I 2023-04-11 08:33:07,041][0m Trial 90 finished with value: 0.15137753557372086 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.29it/s]


** End of epoch, accumulated average loss = 5.112109 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.04it/s]
[32m[I 2023-04-11 08:33:10,820][0m Trial 91 finished with value: 0.15072725902479464 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.78it/s]


** End of epoch, accumulated average loss = 4.800663 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]
[32m[I 2023-04-11 08:33:13,905][0m Trial 92 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.25it/s]


** End of epoch, accumulated average loss = 4.881866 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 08:33:16,845][0m Trial 93 finished with value: 0.15081818867355404 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.34it/s]


** End of epoch, accumulated average loss = 4.665332 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:33:19,829][0m Trial 94 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.46it/s]


** End of epoch, accumulated average loss = 4.841494 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.63it/s]
[32m[I 2023-04-11 08:33:23,654][0m Trial 95 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.83it/s]


** End of epoch, accumulated average loss = 5.085999 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:33:26,651][0m Trial 96 finished with value: 0.15331544653123802 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.29it/s]


** End of epoch, accumulated average loss = 5.284020 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:33:29,661][0m Trial 97 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.55it/s]


** End of epoch, accumulated average loss = 4.792620 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]
[32m[I 2023-04-11 08:33:32,673][0m Trial 98 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.34it/s]


** End of epoch, accumulated average loss = 4.933944 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.49it/s]
[32m[I 2023-04-11 08:33:36,626][0m Trial 99 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.98it/s]


** End of epoch, accumulated average loss = 5.436252 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:33:39,680][0m Trial 100 finished with value: 0.1512401693889897 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.52it/s]


** End of epoch, accumulated average loss = 5.193287 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]
[32m[I 2023-04-11 08:33:42,849][0m Trial 101 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.24it/s]


** End of epoch, accumulated average loss = 4.750725 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.37it/s]
[32m[I 2023-04-11 08:33:45,925][0m Trial 102 finished with value: 0.15096618357487923 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.69it/s]


** End of epoch, accumulated average loss = 4.962964 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.18it/s]
[32m[I 2023-04-11 08:33:49,701][0m Trial 103 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 8 with value: 0.16413623307345096.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.70it/s]


** End of epoch, accumulated average loss = 5.084250 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:33:52,726][0m Trial 104 finished with value: 0.16513912971678638 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.97it/s]


** End of epoch, accumulated average loss = 4.757967 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:33:55,763][0m Trial 105 finished with value: 0.15199878400972794 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.75it/s]


** End of epoch, accumulated average loss = 4.759791 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.09it/s]
[32m[I 2023-04-11 08:33:58,893][0m Trial 106 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.73it/s]


** End of epoch, accumulated average loss = 5.338321 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.39it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.38it/s]
[32m[I 2023-04-11 08:34:02,670][0m Trial 107 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.66it/s]


** End of epoch, accumulated average loss = 4.762464 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 08:34:05,675][0m Trial 108 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.62it/s]


** End of epoch, accumulated average loss = 5.050960 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:34:08,698][0m Trial 109 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.67it/s]


** End of epoch, accumulated average loss = 4.960645 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.09it/s]
[32m[I 2023-04-11 08:34:11,799][0m Trial 110 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.67it/s]


** End of epoch, accumulated average loss = 4.920592 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.18it/s]
[32m[I 2023-04-11 08:34:15,646][0m Trial 111 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.17it/s]


** End of epoch, accumulated average loss = 5.103088 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]
[32m[I 2023-04-11 08:34:18,669][0m Trial 112 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.72it/s]


** End of epoch, accumulated average loss = 5.293188 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 08:34:21,711][0m Trial 113 finished with value: 0.1520103367028958 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.43it/s]


** End of epoch, accumulated average loss = 4.967719 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.88it/s]
[32m[I 2023-04-11 08:34:24,961][0m Trial 114 finished with value: 0.151285930408472 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.07it/s]


** End of epoch, accumulated average loss = 4.826570 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.41it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:34:28,783][0m Trial 115 finished with value: 0.15138899402013473 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.34it/s]


** End of epoch, accumulated average loss = 4.849598 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 08:34:31,833][0m Trial 116 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 104 with value: 0.16513912971678638.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.12it/s]


** End of epoch, accumulated average loss = 5.108455 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:34:34,907][0m Trial 117 finished with value: 0.16740604335816525 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 117 with value: 0.16740604335816525.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.84it/s]


** End of epoch, accumulated average loss = 5.181396 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.92it/s]
[32m[I 2023-04-11 08:34:38,066][0m Trial 118 finished with value: 0.15097757982939533 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 117 with value: 0.16740604335816525.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.02it/s]


** End of epoch, accumulated average loss = 4.871259 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.41it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:34:41,825][0m Trial 119 finished with value: 0.1506931886678722 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 117 with value: 0.16740604335816525.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.14it/s]


** End of epoch, accumulated average loss = 5.092239 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:34:44,884][0m Trial 120 finished with value: 0.15070454374199382 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 117 with value: 0.16740604335816525.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.46it/s]


** End of epoch, accumulated average loss = 4.909460 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 08:34:47,892][0m Trial 121 finished with value: 0.16818028927009757 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.65it/s]


** End of epoch, accumulated average loss = 4.805949 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.81it/s]
[32m[I 2023-04-11 08:34:51,071][0m Trial 122 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.41it/s]


** End of epoch, accumulated average loss = 4.945550 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]
[32m[I 2023-04-11 08:34:54,800][0m Trial 123 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.15it/s]


** End of epoch, accumulated average loss = 4.961687 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 08:34:57,801][0m Trial 124 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.07it/s]


** End of epoch, accumulated average loss = 4.817232 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:35:00,806][0m Trial 125 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.55it/s]


** End of epoch, accumulated average loss = 5.277380 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.94it/s]
[32m[I 2023-04-11 08:35:03,968][0m Trial 126 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 17.58it/s]


** End of epoch, accumulated average loss = 4.959472 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]
[32m[I 2023-04-11 08:35:07,786][0m Trial 127 finished with value: 0.15067048365225252 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.21it/s]


** End of epoch, accumulated average loss = 5.456274 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:35:10,825][0m Trial 128 finished with value: 0.1529168896704641 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.61it/s]


** End of epoch, accumulated average loss = 4.755399 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:35:13,905][0m Trial 129 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.04it/s]


** End of epoch, accumulated average loss = 5.241474 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.83it/s]
[32m[I 2023-04-11 08:35:17,119][0m Trial 130 finished with value: 0.15192950470981464 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.38it/s]


** End of epoch, accumulated average loss = 4.885731 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]
[32m[I 2023-04-11 08:35:20,910][0m Trial 131 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.81it/s]


** End of epoch, accumulated average loss = 4.912060 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]
[32m[I 2023-04-11 08:35:23,979][0m Trial 132 finished with value: 0.16707042018210674 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.61it/s]


** End of epoch, accumulated average loss = 4.953468 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]
[32m[I 2023-04-11 08:35:27,045][0m Trial 133 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.84it/s]


** End of epoch, accumulated average loss = 4.934977 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.69it/s]
[32m[I 2023-04-11 08:35:30,269][0m Trial 134 finished with value: 0.15077271013946475 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.88it/s]


** End of epoch, accumulated average loss = 5.089869 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.37it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:35:34,020][0m Trial 135 finished with value: 0.15106881184379484 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.85it/s]


** End of epoch, accumulated average loss = 4.877568 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 08:35:37,028][0m Trial 136 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.70it/s]


** End of epoch, accumulated average loss = 5.376056 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 08:35:40,038][0m Trial 137 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.67it/s]


** End of epoch, accumulated average loss = 4.931604 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.87it/s]
[32m[I 2023-04-11 08:35:43,196][0m Trial 138 finished with value: 0.1552433439416285 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.99it/s]


** End of epoch, accumulated average loss = 5.280728 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.36it/s]
[32m[I 2023-04-11 08:35:46,961][0m Trial 139 finished with value: 0.15074998115625235 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.50it/s]


** End of epoch, accumulated average loss = 5.594990 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:35:50,123][0m Trial 140 finished with value: 0.10575296108291032 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.57it/s]


** End of epoch, accumulated average loss = 4.959478 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 08:35:53,117][0m Trial 141 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.36it/s]


** End of epoch, accumulated average loss = 5.195197 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.90it/s]
[32m[I 2023-04-11 08:35:56,280][0m Trial 142 finished with value: 0.1589825119236884 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.50it/s]


** End of epoch, accumulated average loss = 4.730941 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:35:59,999][0m Trial 143 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.92it/s]


** End of epoch, accumulated average loss = 4.815735 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.85it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 08:36:02,950][0m Trial 144 finished with value: 0.15060240963855423 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.68it/s]


** End of epoch, accumulated average loss = 5.191350 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]
[32m[I 2023-04-11 08:36:05,911][0m Trial 145 finished with value: 0.15535187199005748 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.52it/s]


** End of epoch, accumulated average loss = 5.023242 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.90it/s]
[32m[I 2023-04-11 08:36:09,067][0m Trial 146 finished with value: 0.1568627450980392 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.96it/s]


** End of epoch, accumulated average loss = 4.845931 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.27it/s]
[32m[I 2023-04-11 08:36:12,891][0m Trial 147 finished with value: 0.1506931886678722 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.15it/s]


** End of epoch, accumulated average loss = 5.092252 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 08:36:15,884][0m Trial 148 finished with value: 0.15273004963726614 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.08it/s]


** End of epoch, accumulated average loss = 4.814511 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]
[32m[I 2023-04-11 08:36:18,946][0m Trial 149 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.05it/s]


** End of epoch, accumulated average loss = 5.115393 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.86it/s]
[32m[I 2023-04-11 08:36:22,127][0m Trial 150 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.16it/s]


** End of epoch, accumulated average loss = 5.136765 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:36:25,870][0m Trial 151 finished with value: 0.154071335028118 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.75it/s]


** End of epoch, accumulated average loss = 4.853374 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]
[32m[I 2023-04-11 08:36:29,069][0m Trial 152 finished with value: 0.1562255897516013 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.44it/s]


** End of epoch, accumulated average loss = 4.747596 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 08:36:32,114][0m Trial 153 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.62it/s]


** End of epoch, accumulated average loss = 5.027668 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.46it/s]
[32m[I 2023-04-11 08:36:35,377][0m Trial 154 finished with value: 0.1538935056940597 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.81it/s]


** End of epoch, accumulated average loss = 4.755236 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 08:36:39,044][0m Trial 155 finished with value: 0.15206812652068127 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.78it/s]


** End of epoch, accumulated average loss = 5.090093 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 08:36:42,073][0m Trial 156 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.73it/s]


** End of epoch, accumulated average loss = 5.299102 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]
[32m[I 2023-04-11 08:36:45,124][0m Trial 157 finished with value: 0.15558148580318942 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.72it/s]


** End of epoch, accumulated average loss = 5.285931 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.55it/s]
[32m[I 2023-04-11 08:36:48,364][0m Trial 158 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.74it/s]


** End of epoch, accumulated average loss = 4.926818 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:36:52,038][0m Trial 159 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.45it/s]


** End of epoch, accumulated average loss = 5.001323 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.86it/s]
[32m[I 2023-04-11 08:36:55,074][0m Trial 160 finished with value: 0.15060240963855423 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.91it/s]


** End of epoch, accumulated average loss = 5.094712 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 08:36:58,078][0m Trial 161 finished with value: 0.16159004605316313 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.03it/s]


** End of epoch, accumulated average loss = 4.845313 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.40it/s]
[32m[I 2023-04-11 08:37:01,385][0m Trial 162 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.38it/s]


** End of epoch, accumulated average loss = 5.162407 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:37:05,099][0m Trial 163 finished with value: 0.15280006112002445 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.29it/s]


** End of epoch, accumulated average loss = 4.909789 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:37:08,112][0m Trial 164 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.03it/s]


** End of epoch, accumulated average loss = 4.975367 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:37:11,250][0m Trial 165 finished with value: 0.15096618357487923 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.52it/s]


** End of epoch, accumulated average loss = 5.025753 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.87it/s]
[32m[I 2023-04-11 08:37:14,716][0m Trial 166 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.01it/s]


** End of epoch, accumulated average loss = 5.234565 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.99it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:37:18,330][0m Trial 167 finished with value: 0.11647545279832276 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.44it/s]


** End of epoch, accumulated average loss = 5.153457 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 08:37:21,387][0m Trial 168 finished with value: 0.1523809523809524 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.61it/s]


** End of epoch, accumulated average loss = 4.855397 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 08:37:24,443][0m Trial 169 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.38it/s]


** End of epoch, accumulated average loss = 4.767811 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.80it/s]
[32m[I 2023-04-11 08:37:27,910][0m Trial 170 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.40it/s]


** End of epoch, accumulated average loss = 5.140460 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.09it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.73it/s]
[32m[I 2023-04-11 08:37:31,408][0m Trial 171 finished with value: 0.1564945226917058 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.37it/s]


** End of epoch, accumulated average loss = 5.197653 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.81it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.91it/s]
[32m[I 2023-04-11 08:37:34,380][0m Trial 172 finished with value: 0.15216068167985392 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.18it/s]


** End of epoch, accumulated average loss = 4.900052 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.89it/s]
[32m[I 2023-04-11 08:37:37,373][0m Trial 173 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.51it/s]


** End of epoch, accumulated average loss = 4.981817 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.12it/s]
[32m[I 2023-04-11 08:37:40,708][0m Trial 174 finished with value: 0.15240417587441896 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.68it/s]


** End of epoch, accumulated average loss = 4.807532 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.07it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]
[32m[I 2023-04-11 08:37:44,317][0m Trial 175 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.50it/s]


** End of epoch, accumulated average loss = 4.881527 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]
[32m[I 2023-04-11 08:37:47,340][0m Trial 176 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.74it/s]


** End of epoch, accumulated average loss = 5.689771 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]
[32m[I 2023-04-11 08:37:50,504][0m Trial 177 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.92it/s]


** End of epoch, accumulated average loss = 4.895876 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.59it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.82it/s]
[32m[I 2023-04-11 08:37:53,952][0m Trial 178 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.36it/s]


** End of epoch, accumulated average loss = 5.120218 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.30it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.76it/s]
[32m[I 2023-04-11 08:37:57,427][0m Trial 179 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.76it/s]


** End of epoch, accumulated average loss = 4.991794 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:38:00,475][0m Trial 180 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.53it/s]


** End of epoch, accumulated average loss = 4.859775 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.79it/s]
[32m[I 2023-04-11 08:38:03,510][0m Trial 181 finished with value: 0.1528584530724549 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.62it/s]


** End of epoch, accumulated average loss = 5.001847 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.09it/s]
[32m[I 2023-04-11 08:38:06,856][0m Trial 182 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.36it/s]


** End of epoch, accumulated average loss = 5.019679 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.18it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:38:10,390][0m Trial 183 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.67it/s]


** End of epoch, accumulated average loss = 4.998579 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]
[32m[I 2023-04-11 08:38:13,443][0m Trial 184 finished with value: 0.15064778547755348 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.21it/s]


** End of epoch, accumulated average loss = 5.064951 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]
[32m[I 2023-04-11 08:38:16,512][0m Trial 185 finished with value: 0.15062509414068384 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.60it/s]


** End of epoch, accumulated average loss = 5.098964 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.88it/s]
[32m[I 2023-04-11 08:38:19,906][0m Trial 186 finished with value: 0.15173355587588197 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.11it/s]


** End of epoch, accumulated average loss = 4.807232 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.13it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]
[32m[I 2023-04-11 08:38:23,470][0m Trial 187 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.64it/s]


** End of epoch, accumulated average loss = 4.864374 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:38:26,540][0m Trial 188 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.40it/s]


** End of epoch, accumulated average loss = 4.737803 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:38:29,589][0m Trial 189 finished with value: 0.15899515064790523 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.97it/s]


** End of epoch, accumulated average loss = 4.801383 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.52it/s]
[32m[I 2023-04-11 08:38:33,287][0m Trial 190 finished with value: 0.1520450053215752 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.17it/s]


** End of epoch, accumulated average loss = 4.956354 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:38:36,819][0m Trial 191 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.60it/s]


** End of epoch, accumulated average loss = 4.846214 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]
[32m[I 2023-04-11 08:38:39,869][0m Trial 192 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.83it/s]


** End of epoch, accumulated average loss = 4.981748 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.68it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]
[32m[I 2023-04-11 08:38:42,939][0m Trial 193 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.50it/s]


** End of epoch, accumulated average loss = 5.100535 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.42it/s]
[32m[I 2023-04-11 08:38:46,556][0m Trial 194 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.16it/s]


** End of epoch, accumulated average loss = 5.033303 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]
[32m[I 2023-04-11 08:38:49,987][0m Trial 195 finished with value: 0.15081818867355404 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.32it/s]


** End of epoch, accumulated average loss = 5.082979 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:38:53,047][0m Trial 196 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.00it/s]


** End of epoch, accumulated average loss = 5.147571 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]
[32m[I 2023-04-11 08:38:56,154][0m Trial 197 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.90it/s]


** End of epoch, accumulated average loss = 4.799912 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.24it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.51it/s]
[32m[I 2023-04-11 08:38:59,772][0m Trial 198 finished with value: 0.15076134479119555 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.94it/s]


** End of epoch, accumulated average loss = 4.795726 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:39:03,181][0m Trial 199 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.89it/s]


** End of epoch, accumulated average loss = 5.104496 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.71it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 08:39:06,218][0m Trial 200 finished with value: 0.15729453401494298 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.38it/s]


** End of epoch, accumulated average loss = 4.870303 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:39:09,279][0m Trial 201 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.08it/s]


** End of epoch, accumulated average loss = 4.960779 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.07it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.71it/s]
[32m[I 2023-04-11 08:39:12,884][0m Trial 202 finished with value: 0.15116015418335726 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 15.13it/s]


** End of epoch, accumulated average loss = 5.423961 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.02it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]
[32m[I 2023-04-11 08:39:16,455][0m Trial 203 finished with value: 0.1525087692542321 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.56it/s]


** End of epoch, accumulated average loss = 4.822321 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 08:39:19,531][0m Trial 204 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.09it/s]


** End of epoch, accumulated average loss = 5.036954 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.77it/s]
[32m[I 2023-04-11 08:39:22,586][0m Trial 205 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.13it/s]


** End of epoch, accumulated average loss = 4.938557 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.62it/s]
[32m[I 2023-04-11 08:39:26,271][0m Trial 206 finished with value: 0.15108022359873094 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.54it/s]


** End of epoch, accumulated average loss = 4.795105 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.11it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.78it/s]
[32m[I 2023-04-11 08:39:29,537][0m Trial 207 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.92it/s]


** End of epoch, accumulated average loss = 4.994600 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]
[32m[I 2023-04-11 08:39:32,586][0m Trial 208 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.82it/s]


** End of epoch, accumulated average loss = 4.758202 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.75it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.33it/s]
[32m[I 2023-04-11 08:39:35,666][0m Trial 209 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.67it/s]


** End of epoch, accumulated average loss = 5.303890 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.80it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.70it/s]
[32m[I 2023-04-11 08:39:39,312][0m Trial 210 finished with value: 0.15415446277169723 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.99it/s]


** End of epoch, accumulated average loss = 4.855481 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.21it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.69it/s]
[32m[I 2023-04-11 08:39:42,627][0m Trial 211 finished with value: 0.15061375103546953 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.73it/s]


** End of epoch, accumulated average loss = 4.901083 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]
[32m[I 2023-04-11 08:39:45,740][0m Trial 212 finished with value: 0.15057973196807709 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 28.99it/s]


** End of epoch, accumulated average loss = 5.017310 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]
[32m[I 2023-04-11 08:39:48,851][0m Trial 213 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.87it/s]


** End of epoch, accumulated average loss = 4.868440 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.47it/s]
[32m[I 2023-04-11 08:39:52,598][0m Trial 214 finished with value: 0.15549681231534754 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.55it/s]


** End of epoch, accumulated average loss = 4.825848 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.56it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.36it/s]
[32m[I 2023-04-11 08:39:56,050][0m Trial 215 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.05it/s]


** End of epoch, accumulated average loss = 5.104791 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:39:59,107][0m Trial 216 finished with value: 0.1530221882172915 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.62it/s]


** End of epoch, accumulated average loss = 4.836089 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]
[32m[I 2023-04-11 08:40:02,222][0m Trial 217 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.39it/s]


** End of epoch, accumulated average loss = 5.007232 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.40it/s]
[32m[I 2023-04-11 08:40:06,052][0m Trial 218 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 25.32it/s]


** End of epoch, accumulated average loss = 4.956059 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:40:09,235][0m Trial 219 finished with value: 0.15284677111196024 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.49it/s]


** End of epoch, accumulated average loss = 4.791379 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.82it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]
[32m[I 2023-04-11 08:40:12,779][0m Trial 220 finished with value: 0.15093200513168817 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.63it/s]


** End of epoch, accumulated average loss = 5.061939 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:40:15,859][0m Trial 221 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.75it/s]


** End of epoch, accumulated average loss = 4.928781 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.79it/s]
[32m[I 2023-04-11 08:40:19,843][0m Trial 222 finished with value: 0.14903129657228018 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.74it/s]


** End of epoch, accumulated average loss = 4.980672 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.36it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]
[32m[I 2023-04-11 08:40:22,975][0m Trial 223 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.89it/s]


** End of epoch, accumulated average loss = 5.233591 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:40:26,071][0m Trial 224 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.53it/s]


** End of epoch, accumulated average loss = 4.928457 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:40:29,151][0m Trial 225 finished with value: 0.1509433962264151 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.76it/s]


** End of epoch, accumulated average loss = 4.861189 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.43it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.99it/s]
[32m[I 2023-04-11 08:40:33,109][0m Trial 226 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.37it/s]


** End of epoch, accumulated average loss = 4.942942 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:40:36,355][0m Trial 227 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.83it/s]


** End of epoch, accumulated average loss = 4.982161 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]
[32m[I 2023-04-11 08:40:39,447][0m Trial 228 finished with value: 0.15119443604475355 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.74it/s]


** End of epoch, accumulated average loss = 4.954257 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.67it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]
[32m[I 2023-04-11 08:40:42,500][0m Trial 229 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.41it/s]


** End of epoch, accumulated average loss = 4.987853 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.37it/s]
[32m[I 2023-04-11 08:40:46,460][0m Trial 230 finished with value: 0.1584032947885316 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.29it/s]


** End of epoch, accumulated average loss = 4.977401 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.66it/s]
[32m[I 2023-04-11 08:40:49,572][0m Trial 231 finished with value: 0.15113730824454016 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 33.54it/s]


** End of epoch, accumulated average loss = 5.062425 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.83it/s]
[32m[I 2023-04-11 08:40:52,577][0m Trial 232 finished with value: 0.1584534938995405 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.85it/s]


** End of epoch, accumulated average loss = 4.879211 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]
[32m[I 2023-04-11 08:40:55,685][0m Trial 233 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.61it/s]


** End of epoch, accumulated average loss = 4.812226 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.56it/s]
[32m[I 2023-04-11 08:40:59,564][0m Trial 234 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.84it/s]


** End of epoch, accumulated average loss = 4.727185 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.72it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.33it/s]
[32m[I 2023-04-11 08:41:03,171][0m Trial 235 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.08it/s]


** End of epoch, accumulated average loss = 4.932546 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.36it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.97it/s]
[32m[I 2023-04-11 08:41:08,115][0m Trial 236 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.45it/s]


** End of epoch, accumulated average loss = 5.119505 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.84it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:03<00:00,  2.94it/s]
[32m[I 2023-04-11 08:41:14,849][0m Trial 237 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 18.60it/s]


** End of epoch, accumulated average loss = 5.670890 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.87it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.89it/s]
[32m[I 2023-04-11 08:41:19,250][0m Trial 238 finished with value: 0.15252039960344696 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.47it/s]


** End of epoch, accumulated average loss = 5.018061 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.55it/s]
[32m[I 2023-04-11 08:41:24,035][0m Trial 239 finished with value: 0.15246226558926665 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:01,  8.65it/s]


** End of epoch, accumulated average loss = 5.260994 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.08it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.57it/s]
[32m[I 2023-04-11 08:41:29,999][0m Trial 240 finished with value: 0.15268341094740057 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 26.48it/s]


** End of epoch, accumulated average loss = 4.888068 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.62it/s]
[32m[I 2023-04-11 08:41:34,642][0m Trial 241 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 23.32it/s]


** End of epoch, accumulated average loss = 4.896985 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  3.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:02<00:00,  4.39it/s]
[32m[I 2023-04-11 08:41:40,423][0m Trial 242 finished with value: 0.15183723048891587 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 13.53it/s]


** End of epoch, accumulated average loss = 4.905338 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.54it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:41:43,991][0m Trial 243 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.32it/s]


** End of epoch, accumulated average loss = 5.045589 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:41:47,115][0m Trial 244 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.37it/s]


** End of epoch, accumulated average loss = 5.053550 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.27it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.69it/s]
[32m[I 2023-04-11 08:41:50,709][0m Trial 245 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.48it/s]


** End of epoch, accumulated average loss = 4.958219 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.23it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]
[32m[I 2023-04-11 08:41:54,278][0m Trial 246 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.23it/s]


** End of epoch, accumulated average loss = 5.423469 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.49it/s]
[32m[I 2023-04-11 08:41:57,397][0m Trial 247 finished with value: 0.1538935056940597 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.70it/s]


** End of epoch, accumulated average loss = 4.832299 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]
[32m[I 2023-04-11 08:42:00,519][0m Trial 248 finished with value: 0.15055706112616682 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.78it/s]


** End of epoch, accumulated average loss = 5.007655 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.57it/s]
[32m[I 2023-04-11 08:42:04,117][0m Trial 249 finished with value: 0.15886885376122012 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.71it/s]


** End of epoch, accumulated average loss = 4.941151 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.49it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]
[32m[I 2023-04-11 08:42:07,584][0m Trial 250 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.92it/s]


** End of epoch, accumulated average loss = 5.014253 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.65it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 08:42:10,699][0m Trial 251 finished with value: 0.15216068167985392 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.15it/s]


** End of epoch, accumulated average loss = 5.430777 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.70it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]
[32m[I 2023-04-11 08:42:13,757][0m Trial 252 finished with value: 0.1535980339451655 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.83it/s]


** End of epoch, accumulated average loss = 4.853206 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.48it/s]
[32m[I 2023-04-11 08:42:17,607][0m Trial 253 finished with value: 0.15070454374199382 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.06it/s]


** End of epoch, accumulated average loss = 5.076453 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.91it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]
[32m[I 2023-04-11 08:42:21,011][0m Trial 254 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.41it/s]


** End of epoch, accumulated average loss = 4.976405 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.55it/s]
[32m[I 2023-04-11 08:42:24,123][0m Trial 255 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.39it/s]


** End of epoch, accumulated average loss = 5.327138 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.38it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.34it/s]
[32m[I 2023-04-11 08:42:27,276][0m Trial 256 finished with value: 0.15064778547755348 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.30it/s]


** End of epoch, accumulated average loss = 4.952440 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.63it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.31it/s]
[32m[I 2023-04-11 08:42:31,110][0m Trial 257 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 18.91it/s]


** End of epoch, accumulated average loss = 4.925523 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]
[32m[I 2023-04-11 08:42:34,510][0m Trial 258 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.11it/s]


** End of epoch, accumulated average loss = 5.162317 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]
[32m[I 2023-04-11 08:42:37,593][0m Trial 259 finished with value: 0.15064778547755348 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.80it/s]


** End of epoch, accumulated average loss = 4.770469 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.64it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]
[32m[I 2023-04-11 08:42:40,695][0m Trial 260 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0003}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.61it/s]


** End of epoch, accumulated average loss = 5.228629 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.32it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.60it/s]
[32m[I 2023-04-11 08:42:44,500][0m Trial 261 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.65it/s]


** End of epoch, accumulated average loss = 4.885945 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.62it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.63it/s]
[32m[I 2023-04-11 08:42:47,748][0m Trial 262 finished with value: 0.1505457282649605 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.53it/s]


** End of epoch, accumulated average loss = 5.120903 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]
[32m[I 2023-04-11 08:42:50,865][0m Trial 263 finished with value: 0.15709685020815334 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.40it/s]


** End of epoch, accumulated average loss = 4.830377 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.58it/s]
[32m[I 2023-04-11 08:42:53,940][0m Trial 264 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.29it/s]


** End of epoch, accumulated average loss = 4.899445 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.28it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.72it/s]
[32m[I 2023-04-11 08:42:57,730][0m Trial 265 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 24.64it/s]


** End of epoch, accumulated average loss = 4.792521 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.39it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]
[32m[I 2023-04-11 08:43:01,145][0m Trial 266 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.97it/s]


** End of epoch, accumulated average loss = 5.018980 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.39it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]
[32m[I 2023-04-11 08:43:04,278][0m Trial 267 finished with value: 0.15080681646810434 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.55it/s]


** End of epoch, accumulated average loss = 5.126245 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]
[32m[I 2023-04-11 08:43:07,402][0m Trial 268 finished with value: 0.1505683956937439 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.02it/s]


** End of epoch, accumulated average loss = 4.923036 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.88it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.44it/s]
[32m[I 2023-04-11 08:43:11,388][0m Trial 269 finished with value: 0.15074998115625235 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.37it/s]


** End of epoch, accumulated average loss = 5.310558 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.48it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.34it/s]
[32m[I 2023-04-11 08:43:14,547][0m Trial 270 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.57it/s]


** End of epoch, accumulated average loss = 4.968565 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.37it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.41it/s]
[32m[I 2023-04-11 08:43:17,692][0m Trial 271 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.40it/s]


** End of epoch, accumulated average loss = 5.222113 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.39it/s]
[32m[I 2023-04-11 08:43:20,807][0m Trial 272 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.57it/s]


** End of epoch, accumulated average loss = 4.845398 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.77it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.47it/s]
[32m[I 2023-04-11 08:43:24,818][0m Trial 273 finished with value: 0.15090922809929827 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.95it/s]


** End of epoch, accumulated average loss = 4.758123 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.38it/s]
[32m[I 2023-04-11 08:43:27,952][0m Trial 274 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.26it/s]


** End of epoch, accumulated average loss = 4.972938 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.43it/s]
[32m[I 2023-04-11 08:43:31,078][0m Trial 275 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.88it/s]


** End of epoch, accumulated average loss = 4.932050 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.60it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.22it/s]
[32m[I 2023-04-11 08:43:34,210][0m Trial 276 finished with value: 0.15063643895458312 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.2, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.58it/s]


** End of epoch, accumulated average loss = 4.886859 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.69it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.58it/s]
[32m[I 2023-04-11 08:43:38,206][0m Trial 277 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.11it/s]


** End of epoch, accumulated average loss = 4.934099 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.50it/s]
[32m[I 2023-04-11 08:43:41,435][0m Trial 278 finished with value: 0.1508978421608571 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.15, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.84it/s]


** End of epoch, accumulated average loss = 4.946945 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.61it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]
[32m[I 2023-04-11 08:43:44,532][0m Trial 279 finished with value: 0.15616459748574998 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.83it/s]


** End of epoch, accumulated average loss = 4.934424 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.39it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.33it/s]
[32m[I 2023-04-11 08:43:47,713][0m Trial 280 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.15, 'dropout_relu': 0.1, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.36it/s]


** End of epoch, accumulated average loss = 5.130517 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.22it/s]
[32m[I 2023-04-11 08:43:51,780][0m Trial 281 finished with value: 0.150591069949552 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.11it/s]


** End of epoch, accumulated average loss = 4.969996 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.32it/s]
[32m[I 2023-04-11 08:43:54,936][0m Trial 282 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.50it/s]


** End of epoch, accumulated average loss = 5.092847 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.22it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.28it/s]
[32m[I 2023-04-11 08:43:58,126][0m Trial 283 finished with value: 0.1520450053215752 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0003}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.72it/s]


** End of epoch, accumulated average loss = 5.111064 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.35it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.57it/s]
[32m[I 2023-04-11 08:44:01,444][0m Trial 284 finished with value: 0.15100037750094375 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 19.74it/s]


** End of epoch, accumulated average loss = 4.748421 **
** Elapsed time: 0:00:01**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.23it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.13it/s]
[32m[I 2023-04-11 08:44:05,431][0m Trial 285 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.16it/s]


** End of epoch, accumulated average loss = 5.518911 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]
[32m[I 2023-04-11 08:44:08,551][0m Trial 286 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.2, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.60it/s]


** End of epoch, accumulated average loss = 4.945510 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.57it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.42it/s]
[32m[I 2023-04-11 08:44:11,663][0m Trial 287 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.52it/s]


** End of epoch, accumulated average loss = 4.898839 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.40it/s]
[32m[I 2023-04-11 08:44:15,020][0m Trial 288 finished with value: 0.1570721746642582 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 22.82it/s]


** End of epoch, accumulated average loss = 4.977206 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.42it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.24it/s]
[32m[I 2023-04-11 08:44:18,823][0m Trial 289 finished with value: 0.15082956259426847 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 31.10it/s]


** End of epoch, accumulated average loss = 4.757313 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.29it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.35it/s]
[32m[I 2023-04-11 08:44:21,992][0m Trial 290 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.001}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.97it/s]


** End of epoch, accumulated average loss = 5.225272 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.51it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.24it/s]
[32m[I 2023-04-11 08:44:25,281][0m Trial 291 finished with value: 0.15112588786459122 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 29.91it/s]


** End of epoch, accumulated average loss = 4.870263 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.52it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.89it/s]
[32m[I 2023-04-11 08:44:28,778][0m Trial 292 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 20.98it/s]


** End of epoch, accumulated average loss = 5.102291 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.74it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.45it/s]
[32m[I 2023-04-11 08:44:32,488][0m Trial 293 finished with value: 0.15109163707788772 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.1, 'dropout_residual': 0.1, 'dropout_relu': 0.15, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.87it/s]


** End of epoch, accumulated average loss = 4.954637 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.30it/s]
[32m[I 2023-04-11 08:44:35,650][0m Trial 294 finished with value: 0.15626220798499885 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.53it/s]


** End of epoch, accumulated average loss = 4.911164 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.38it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.46it/s]
[32m[I 2023-04-11 08:44:38,796][0m Trial 295 finished with value: 0.15074998115625235 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 30.61it/s]


** End of epoch, accumulated average loss = 4.879667 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.53it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.56it/s]
[32m[I 2023-04-11 08:44:42,378][0m Trial 296 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 21.26it/s]


** End of epoch, accumulated average loss = 4.685280 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.32it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.29it/s]
[32m[I 2023-04-11 08:44:45,961][0m Trial 297 finished with value: 0.1505343971097396 and parameters: {'dropout_embedding': 0.1, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.02it/s]


** End of epoch, accumulated average loss = 4.896193 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.44it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.40it/s]
[32m[I 2023-04-11 08:44:49,091][0m Trial 298 finished with value: 0.15168752370117558 and parameters: {'dropout_embedding': 0.15, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.0005}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.15it/s]


** End of epoch, accumulated average loss = 5.778775 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.47it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  7.33it/s]
[32m[I 2023-04-11 08:44:52,234][0m Trial 299 finished with value: 0.15723270440251572 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


10it [00:00, 32.02it/s]


** End of epoch, accumulated average loss = 5.760093 **
** Elapsed time: 0:00:00**
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.98it/s]


Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  5.55it/s]
[32m[I 2023-04-11 08:44:55,909][0m Trial 300 finished with value: 0.1586797841954935 and parameters: {'dropout_embedding': 0.2, 'dropout_attention': 0.2, 'dropout_residual': 0.1, 'dropout_relu': 0.1, 'lr_peak': 0.002}. Best is trial 121 with value: 0.16818028927009757.[0m


Best trial:
  Score: 0.168
  Params:
    dropout_embedding: 0.15
    dropout_attention: 0.2
    dropout_residual: 0.1
    dropout_relu: 0.15
    lr_peak: 0.0005


### Describe the experiments and results

*Question: What are the optimal hyperpameters according to your experiments? Add plots or other descriptions here.* 

Crossentropy version | embedding | attention | residual | relu |lr_peak | 1 / mean_ld@1
--- | --- | --- | --- | --- | --- | --- 
No Label Smoothing | 0.15 | 0.1 | 0.15 | 0.2 | 5e-4 | 0.1702
Smoothing w/ alpha=0.05 | 0.2 | 0.15 | 0.15 | 0.1 | 1e-3 | 0.1672
Smoothing w/ alpha=0.1 | 0.15 | 0.2 | 0.1 | 0.15 | 3e-4 | 0.1680
Smoothing w/ alpha=0.2 | 0.15 | 0.2 | 0.1 | 0.15 | 5e-4 | 0.1682

**ENTER HERE YOUR ANSWER**

I try to maximize the inverse mean Levenshtein distance, which is equivalent to minimizing the mean Levenshtein distance. In other words, by using Optuna, the optimizer is required to find the set of hyperparameters that produces the most accurate predictions. I do not use accuracy@1 because of the small training set; the accuracy is often equal to zero in almost all cases.
As you can see, after 300 trials with different values of label smoothing parameter $\alpha$ and dropout rates for embedding, attention, residual and ReLU layers, as well as peak learning rates, we found the following optimal hyperparameters:

+ Dropout rates: \\
    dropout_embedding: 0.15 \\
    dropout_attention: 0.1  \\
    dropout_residual: 0.15 \\
    dropout_relu: 0.2 \\
+ lr_peak: 0.0005  \\
+ alpha: 0

We found that using dropout rates of 0.15 for embedding and residual layers, 0.1 for attention layers and 0.2 for ReLU layer resulted in the best performance. These dropout rates were chosen because they were able to prevent overfitting without significantly impacting the training time or the model's accuracy. The peak learning rate of 5e-4 was found to be optimal as it allowed the model to converge faster and achieve lower training and validation loss. We also observed that not using label smoothing resulted in slightly better performance than using it.

Overall, the model with the optimal hyperparameters achieved an accuracy@1 of 0.656 on the test set.




### Training

In [None]:
PREDS_FNAME = "preds_translit.tsv"
SCORED_PARTS = ('train', 'dev', 'train_small', 'dev_small', 'test')
TRANSLIT_PATH = "TRANSLIT"

In [None]:
top_k = 1
part2ixy = load_dataset(TRANSLIT_PATH, parts=SCORED_PARTS)
train_ids, train_strings, train_transliterations = part2ixy['train']
print('\nTraining classifier on %d examples from train set ...' % len(train_strings))
st = time.time()
params = train(train_strings, train_transliterations, 600, 0)
print('Classifier trained in %.2fs' % (time.time() - st))


Training classifier on 105371 examples from train set ...
Using GPU device: cuda

----------------------------------------
Epoch: 1
Run training...


527it [00:19, 27.11it/s]


** End of epoch, accumulated average loss = 5.557108 **
** Elapsed time: 0:00:19**

----------------------------------------
Epoch: 2
Run training...


527it [00:16, 31.98it/s]


** End of epoch, accumulated average loss = 4.489941 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 3
Run training...


527it [00:18, 28.49it/s]


** End of epoch, accumulated average loss = 4.100591 **
** Elapsed time: 0:00:19**

----------------------------------------
Epoch: 4
Run training...


527it [00:16, 32.92it/s]


** End of epoch, accumulated average loss = 3.857267 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 5
Run training...


527it [00:16, 32.69it/s]


** End of epoch, accumulated average loss = 3.715573 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 6
Run training...


527it [00:18, 28.54it/s]


** End of epoch, accumulated average loss = 3.597347 **
** Elapsed time: 0:00:18**

----------------------------------------
Epoch: 7
Run training...


527it [00:20, 25.57it/s]


** End of epoch, accumulated average loss = 3.300603 **
** Elapsed time: 0:00:21**

----------------------------------------
Epoch: 8
Run training...


527it [00:18, 27.80it/s]


** End of epoch, accumulated average loss = 2.853359 **
** Elapsed time: 0:00:19**

----------------------------------------
Epoch: 9
Run training...


527it [00:17, 30.33it/s]


** End of epoch, accumulated average loss = 2.549801 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 10
Run training...


527it [00:20, 25.71it/s]


** End of epoch, accumulated average loss = 2.396320 **
** Elapsed time: 0:00:21**

----------------------------------------
Epoch: 11
Run training...


527it [00:17, 30.31it/s]


** End of epoch, accumulated average loss = 2.302102 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 12
Run training...


527it [00:17, 30.52it/s]


** End of epoch, accumulated average loss = 2.229124 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 13
Run training...


527it [00:22, 23.88it/s]


** End of epoch, accumulated average loss = 2.167010 **
** Elapsed time: 0:00:22**

----------------------------------------
Epoch: 14
Run training...


527it [00:20, 25.44it/s]


** End of epoch, accumulated average loss = 2.117499 **
** Elapsed time: 0:00:21**

----------------------------------------
Epoch: 15
Run training...


527it [00:16, 31.10it/s]


** End of epoch, accumulated average loss = 2.073744 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 16
Run training...


527it [00:16, 32.31it/s]


** End of epoch, accumulated average loss = 2.031635 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 17
Run training...


527it [00:16, 32.07it/s]


** End of epoch, accumulated average loss = 1.996376 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 18
Run training...


527it [00:16, 32.79it/s]


** End of epoch, accumulated average loss = 1.958490 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 19
Run training...


527it [00:16, 32.58it/s]


** End of epoch, accumulated average loss = 1.929973 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 20
Run training...


527it [00:16, 31.62it/s]


** End of epoch, accumulated average loss = 1.905563 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 21
Run training...


527it [00:16, 32.86it/s]


** End of epoch, accumulated average loss = 1.886360 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 22
Run training...


527it [00:16, 32.87it/s]


** End of epoch, accumulated average loss = 1.868302 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 23
Run training...


527it [00:16, 31.11it/s]


** End of epoch, accumulated average loss = 1.854751 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 24
Run training...


527it [00:16, 32.87it/s]


** End of epoch, accumulated average loss = 1.842715 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 25
Run training...


527it [00:16, 32.92it/s]


** End of epoch, accumulated average loss = 1.832114 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 26
Run training...


527it [00:16, 32.69it/s]


** End of epoch, accumulated average loss = 1.821688 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 27
Run training...


527it [00:16, 31.74it/s]


** End of epoch, accumulated average loss = 1.811996 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 28
Run training...


527it [00:15, 33.54it/s]


** End of epoch, accumulated average loss = 1.803848 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 29
Run training...


527it [00:15, 33.80it/s]


** End of epoch, accumulated average loss = 1.796020 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 30
Run training...


527it [00:16, 32.20it/s]


** End of epoch, accumulated average loss = 1.786974 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 31
Run training...


527it [00:15, 34.44it/s]


** End of epoch, accumulated average loss = 1.779242 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 32
Run training...


527it [00:15, 34.40it/s]


** End of epoch, accumulated average loss = 1.771521 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 33
Run training...


527it [00:15, 33.74it/s]


** End of epoch, accumulated average loss = 1.764120 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 34
Run training...


527it [00:16, 32.05it/s]


** End of epoch, accumulated average loss = 1.758613 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 35
Run training...


527it [00:15, 34.24it/s]


** End of epoch, accumulated average loss = 1.752642 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 36
Run training...


527it [00:15, 34.20it/s]


** End of epoch, accumulated average loss = 1.748701 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 37
Run training...


527it [00:15, 33.90it/s]


** End of epoch, accumulated average loss = 1.744554 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 38
Run training...


527it [00:16, 32.22it/s]


** End of epoch, accumulated average loss = 1.739998 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 39
Run training...


527it [00:15, 33.81it/s]


** End of epoch, accumulated average loss = 1.735569 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 40
Run training...


527it [00:15, 34.25it/s]


** End of epoch, accumulated average loss = 1.733557 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 41
Run training...


527it [00:15, 33.91it/s]


** End of epoch, accumulated average loss = 1.729663 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 42
Run training...


527it [00:16, 32.43it/s]


** End of epoch, accumulated average loss = 1.726306 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 43
Run training...


527it [00:15, 34.16it/s]


** End of epoch, accumulated average loss = 1.723943 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 44
Run training...


527it [00:15, 34.09it/s]


** End of epoch, accumulated average loss = 1.720815 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 45
Run training...


527it [00:15, 33.48it/s]


** End of epoch, accumulated average loss = 1.719323 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 46
Run training...


527it [00:16, 32.64it/s]


** End of epoch, accumulated average loss = 1.716103 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 47
Run training...


527it [00:15, 33.86it/s]


** End of epoch, accumulated average loss = 1.713179 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 48
Run training...


527it [00:15, 33.79it/s]


** End of epoch, accumulated average loss = 1.711609 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 49
Run training...


527it [00:16, 32.79it/s]


** End of epoch, accumulated average loss = 1.710083 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 50
Run training...


527it [00:15, 33.42it/s]


** End of epoch, accumulated average loss = 1.707217 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 51
Run training...


527it [00:15, 34.13it/s]


** End of epoch, accumulated average loss = 1.705795 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 52
Run training...


527it [00:15, 34.44it/s]


** End of epoch, accumulated average loss = 1.703531 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 53
Run training...


527it [00:16, 32.70it/s]


** End of epoch, accumulated average loss = 1.702146 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 54
Run training...


527it [00:15, 33.94it/s]


** End of epoch, accumulated average loss = 1.700645 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 55
Run training...


527it [00:15, 33.86it/s]


** End of epoch, accumulated average loss = 1.699083 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 56
Run training...


527it [00:15, 34.08it/s]


** End of epoch, accumulated average loss = 1.697608 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 57
Run training...


527it [00:16, 32.55it/s]


** End of epoch, accumulated average loss = 1.696106 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 58
Run training...


527it [00:15, 34.01it/s]


** End of epoch, accumulated average loss = 1.694975 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 59
Run training...


527it [00:15, 34.06it/s]


** End of epoch, accumulated average loss = 1.694432 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 60
Run training...


527it [00:15, 34.21it/s]


** End of epoch, accumulated average loss = 1.692689 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 61
Run training...


527it [00:16, 32.49it/s]


** End of epoch, accumulated average loss = 1.690819 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 62
Run training...


527it [00:15, 34.44it/s]


** End of epoch, accumulated average loss = 1.689653 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 63
Run training...


527it [00:15, 34.20it/s]


** End of epoch, accumulated average loss = 1.688320 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 64
Run training...


527it [00:15, 33.88it/s]


** End of epoch, accumulated average loss = 1.687544 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 65
Run training...


527it [00:16, 32.16it/s]


** End of epoch, accumulated average loss = 1.685556 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 66
Run training...


527it [00:15, 33.86it/s]


** End of epoch, accumulated average loss = 1.684560 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 67
Run training...


527it [00:15, 33.97it/s]


** End of epoch, accumulated average loss = 1.682906 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 68
Run training...


527it [00:15, 33.99it/s]


** End of epoch, accumulated average loss = 1.682130 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 69
Run training...


527it [00:16, 32.81it/s]


** End of epoch, accumulated average loss = 1.681741 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 70
Run training...


527it [00:15, 34.11it/s]


** End of epoch, accumulated average loss = 1.680038 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 71
Run training...


527it [00:15, 33.97it/s]


** End of epoch, accumulated average loss = 1.679390 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 72
Run training...


527it [00:15, 33.41it/s]


** End of epoch, accumulated average loss = 1.678135 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 73
Run training...


527it [00:15, 33.16it/s]


** End of epoch, accumulated average loss = 1.677837 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 74
Run training...


527it [00:15, 34.03it/s]


** End of epoch, accumulated average loss = 1.676583 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 75
Run training...


527it [00:15, 34.16it/s]


** End of epoch, accumulated average loss = 1.675644 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 76
Run training...


527it [00:16, 32.91it/s]


** End of epoch, accumulated average loss = 1.674656 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 77
Run training...


527it [00:15, 33.67it/s]


** End of epoch, accumulated average loss = 1.674407 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 78
Run training...


527it [00:16, 31.58it/s]


** End of epoch, accumulated average loss = 1.673632 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 79
Run training...


527it [00:15, 34.24it/s]


** End of epoch, accumulated average loss = 1.673088 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 80
Run training...


527it [00:16, 32.20it/s]


** End of epoch, accumulated average loss = 1.672213 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 81
Run training...


527it [00:15, 34.58it/s]


** End of epoch, accumulated average loss = 1.671105 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 82
Run training...


527it [00:15, 34.12it/s]


** End of epoch, accumulated average loss = 1.670763 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 83
Run training...


527it [00:15, 34.36it/s]


** End of epoch, accumulated average loss = 1.669752 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 84
Run training...


527it [00:16, 32.38it/s]


** End of epoch, accumulated average loss = 1.669096 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 85
Run training...


527it [00:15, 34.42it/s]


** End of epoch, accumulated average loss = 1.668420 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 86
Run training...


527it [00:15, 33.66it/s]


** End of epoch, accumulated average loss = 1.668190 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 87
Run training...


527it [00:15, 34.40it/s]


** End of epoch, accumulated average loss = 1.666974 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 88
Run training...


527it [00:16, 32.53it/s]


** End of epoch, accumulated average loss = 1.666470 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 89
Run training...


527it [00:15, 34.35it/s]


** End of epoch, accumulated average loss = 1.666808 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 90
Run training...


527it [00:15, 33.93it/s]


** End of epoch, accumulated average loss = 1.665900 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 91
Run training...


527it [00:15, 33.85it/s]


** End of epoch, accumulated average loss = 1.665152 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 92
Run training...


527it [00:16, 32.52it/s]


** End of epoch, accumulated average loss = 1.663910 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 93
Run training...


527it [00:15, 34.07it/s]


** End of epoch, accumulated average loss = 1.664129 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 94
Run training...


527it [00:15, 34.44it/s]


** End of epoch, accumulated average loss = 1.663347 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 95
Run training...


527it [00:15, 33.84it/s]


** End of epoch, accumulated average loss = 1.663069 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 96
Run training...


527it [00:16, 32.76it/s]


** End of epoch, accumulated average loss = 1.661852 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 97
Run training...


527it [00:15, 34.49it/s]


** End of epoch, accumulated average loss = 1.661693 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 98
Run training...


527it [00:15, 34.47it/s]


** End of epoch, accumulated average loss = 1.661722 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 99
Run training...


527it [00:15, 33.76it/s]


** End of epoch, accumulated average loss = 1.661134 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 100
Run training...


527it [00:16, 32.69it/s]


** End of epoch, accumulated average loss = 1.660527 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 101
Run training...


527it [00:15, 34.01it/s]


** End of epoch, accumulated average loss = 1.660230 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 102
Run training...


527it [00:15, 34.49it/s]


** End of epoch, accumulated average loss = 1.659606 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 103
Run training...


527it [00:15, 33.46it/s]


** End of epoch, accumulated average loss = 1.659643 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 104
Run training...


527it [00:15, 33.47it/s]


** End of epoch, accumulated average loss = 1.659190 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 105
Run training...


527it [00:15, 33.96it/s]


** End of epoch, accumulated average loss = 1.659356 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 106
Run training...


527it [00:15, 34.20it/s]


** End of epoch, accumulated average loss = 1.658278 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 107
Run training...


527it [00:16, 32.85it/s]


** End of epoch, accumulated average loss = 1.658018 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 108
Run training...


527it [00:15, 33.43it/s]


** End of epoch, accumulated average loss = 1.657587 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 109
Run training...


527it [00:15, 33.94it/s]


** End of epoch, accumulated average loss = 1.657040 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 110
Run training...


527it [00:15, 34.15it/s]


** End of epoch, accumulated average loss = 1.657199 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 111
Run training...


527it [00:16, 32.76it/s]


** End of epoch, accumulated average loss = 1.656042 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 112
Run training...


527it [00:15, 33.73it/s]


** End of epoch, accumulated average loss = 1.655954 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 113
Run training...


527it [00:15, 34.18it/s]


** End of epoch, accumulated average loss = 1.655662 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 114
Run training...


527it [00:15, 34.38it/s]


** End of epoch, accumulated average loss = 1.655281 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 115
Run training...


527it [00:16, 32.82it/s]


** End of epoch, accumulated average loss = 1.655018 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 116
Run training...


527it [00:15, 33.78it/s]


** End of epoch, accumulated average loss = 1.654921 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 117
Run training...


527it [00:15, 33.49it/s]


** End of epoch, accumulated average loss = 1.654698 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 118
Run training...


527it [00:15, 34.59it/s]


** End of epoch, accumulated average loss = 1.654266 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 119
Run training...


527it [00:16, 32.48it/s]


** End of epoch, accumulated average loss = 1.653601 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 120
Run training...


527it [00:15, 34.25it/s]


** End of epoch, accumulated average loss = 1.653328 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 121
Run training...


527it [00:15, 34.13it/s]


** End of epoch, accumulated average loss = 1.652971 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 122
Run training...


527it [00:15, 34.32it/s]


** End of epoch, accumulated average loss = 1.653131 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 123
Run training...


527it [00:16, 32.33it/s]


** End of epoch, accumulated average loss = 1.652825 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 124
Run training...


527it [00:15, 34.47it/s]


** End of epoch, accumulated average loss = 1.652966 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 125
Run training...


527it [00:15, 34.14it/s]


** End of epoch, accumulated average loss = 1.651796 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 126
Run training...


527it [00:15, 33.91it/s]


** End of epoch, accumulated average loss = 1.651511 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 127
Run training...


527it [00:16, 32.15it/s]


** End of epoch, accumulated average loss = 1.651067 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 128
Run training...


527it [00:15, 34.36it/s]


** End of epoch, accumulated average loss = 1.651829 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 129
Run training...


527it [00:15, 34.26it/s]


** End of epoch, accumulated average loss = 1.651094 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 130
Run training...


527it [00:15, 34.17it/s]


** End of epoch, accumulated average loss = 1.650154 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 131
Run training...


527it [00:16, 32.69it/s]


** End of epoch, accumulated average loss = 1.650834 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 132
Run training...


527it [00:15, 34.04it/s]


** End of epoch, accumulated average loss = 1.650284 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 133
Run training...


527it [00:15, 34.18it/s]


** End of epoch, accumulated average loss = 1.649925 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 134
Run training...


527it [00:15, 34.19it/s]


** End of epoch, accumulated average loss = 1.649456 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 135
Run training...


527it [00:16, 32.54it/s]


** End of epoch, accumulated average loss = 1.649538 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 136
Run training...


527it [00:15, 33.66it/s]


** End of epoch, accumulated average loss = 1.649702 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 137
Run training...


527it [00:15, 33.72it/s]


** End of epoch, accumulated average loss = 1.649032 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 138
Run training...


527it [00:15, 33.65it/s]


** End of epoch, accumulated average loss = 1.648836 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 139
Run training...


527it [00:15, 32.97it/s]


** End of epoch, accumulated average loss = 1.648658 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 140
Run training...


527it [00:15, 34.29it/s]


** End of epoch, accumulated average loss = 1.648391 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 141
Run training...


527it [00:15, 34.12it/s]


** End of epoch, accumulated average loss = 1.647949 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 142
Run training...


527it [00:15, 33.45it/s]


** End of epoch, accumulated average loss = 1.647365 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 143
Run training...


527it [00:16, 32.84it/s]


** End of epoch, accumulated average loss = 1.647919 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 144
Run training...


527it [00:15, 33.97it/s]


** End of epoch, accumulated average loss = 1.646836 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 145
Run training...


527it [00:15, 33.69it/s]


** End of epoch, accumulated average loss = 1.646518 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 146
Run training...


527it [00:16, 32.61it/s]


** End of epoch, accumulated average loss = 1.646551 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 147
Run training...


527it [00:15, 33.63it/s]


** End of epoch, accumulated average loss = 1.646640 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 148
Run training...


527it [00:15, 33.82it/s]


** End of epoch, accumulated average loss = 1.646708 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 149
Run training...


527it [00:15, 33.65it/s]


** End of epoch, accumulated average loss = 1.646341 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 150
Run training...


527it [00:16, 32.26it/s]


** End of epoch, accumulated average loss = 1.646038 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 151
Run training...


527it [00:15, 34.14it/s]


** End of epoch, accumulated average loss = 1.645508 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 152
Run training...


527it [00:15, 33.78it/s]


** End of epoch, accumulated average loss = 1.645390 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 153
Run training...


527it [00:15, 33.82it/s]


** End of epoch, accumulated average loss = 1.645472 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 154
Run training...


527it [00:16, 32.42it/s]


** End of epoch, accumulated average loss = 1.644831 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 155
Run training...


527it [00:15, 34.34it/s]


** End of epoch, accumulated average loss = 1.644976 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 156
Run training...


527it [00:15, 34.05it/s]


** End of epoch, accumulated average loss = 1.644729 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 157
Run training...


527it [00:15, 34.07it/s]


** End of epoch, accumulated average loss = 1.645049 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 158
Run training...


527it [00:16, 32.16it/s]


** End of epoch, accumulated average loss = 1.644665 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 159
Run training...


527it [00:15, 34.20it/s]


** End of epoch, accumulated average loss = 1.644026 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 160
Run training...


527it [00:15, 33.91it/s]


** End of epoch, accumulated average loss = 1.643815 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 161
Run training...


527it [00:15, 33.66it/s]


** End of epoch, accumulated average loss = 1.644013 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 162
Run training...


527it [00:16, 32.33it/s]


** End of epoch, accumulated average loss = 1.643656 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 163
Run training...


527it [00:15, 33.93it/s]


** End of epoch, accumulated average loss = 1.643561 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 164
Run training...


527it [00:15, 33.71it/s]


** End of epoch, accumulated average loss = 1.643132 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 165
Run training...


527it [00:16, 32.68it/s]


** End of epoch, accumulated average loss = 1.643554 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 166
Run training...


527it [00:15, 33.08it/s]


** End of epoch, accumulated average loss = 1.642886 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 167
Run training...


527it [00:15, 34.01it/s]


** End of epoch, accumulated average loss = 1.643064 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 168
Run training...


527it [00:15, 33.68it/s]


** End of epoch, accumulated average loss = 1.643022 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 169
Run training...


527it [00:16, 32.47it/s]


** End of epoch, accumulated average loss = 1.642124 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 170
Run training...


527it [00:15, 33.59it/s]


** End of epoch, accumulated average loss = 1.641777 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 171
Run training...


527it [00:15, 33.72it/s]


** End of epoch, accumulated average loss = 1.641751 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 172
Run training...


527it [00:15, 33.47it/s]


** End of epoch, accumulated average loss = 1.641554 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 173
Run training...


527it [00:16, 32.03it/s]


** End of epoch, accumulated average loss = 1.641393 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 174
Run training...


527it [00:15, 33.65it/s]


** End of epoch, accumulated average loss = 1.642090 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 175
Run training...


527it [00:15, 33.86it/s]


** End of epoch, accumulated average loss = 1.641166 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 176
Run training...


527it [00:15, 33.92it/s]


** End of epoch, accumulated average loss = 1.640929 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 177
Run training...


527it [00:16, 31.91it/s]


** End of epoch, accumulated average loss = 1.641309 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 178
Run training...


527it [00:15, 33.86it/s]


** End of epoch, accumulated average loss = 1.640880 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 179
Run training...


527it [00:15, 33.98it/s]


** End of epoch, accumulated average loss = 1.640810 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 180
Run training...


527it [00:15, 33.90it/s]


** End of epoch, accumulated average loss = 1.641254 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 181
Run training...


527it [00:16, 32.94it/s]


** End of epoch, accumulated average loss = 1.640864 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 182
Run training...


527it [00:15, 33.87it/s]


** End of epoch, accumulated average loss = 1.641085 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 183
Run training...


527it [00:15, 33.60it/s]


** End of epoch, accumulated average loss = 1.640833 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 184
Run training...


527it [00:15, 33.30it/s]


** End of epoch, accumulated average loss = 1.640009 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 185
Run training...


527it [00:15, 33.19it/s]


** End of epoch, accumulated average loss = 1.640158 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 186
Run training...


527it [00:15, 34.24it/s]


** End of epoch, accumulated average loss = 1.639708 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 187
Run training...


527it [00:15, 33.87it/s]


** End of epoch, accumulated average loss = 1.639771 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 188
Run training...


527it [00:16, 32.76it/s]


** End of epoch, accumulated average loss = 1.639671 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 189
Run training...


527it [00:15, 33.50it/s]


** End of epoch, accumulated average loss = 1.639037 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 190
Run training...


527it [00:15, 34.36it/s]


** End of epoch, accumulated average loss = 1.639334 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 191
Run training...


527it [00:15, 33.89it/s]


** End of epoch, accumulated average loss = 1.639068 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 192
Run training...


527it [00:16, 32.65it/s]


** End of epoch, accumulated average loss = 1.639027 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 193
Run training...


527it [00:15, 33.94it/s]


** End of epoch, accumulated average loss = 1.638987 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 194
Run training...


527it [00:15, 33.98it/s]


** End of epoch, accumulated average loss = 1.638824 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 195
Run training...


527it [00:15, 34.24it/s]


** End of epoch, accumulated average loss = 1.638296 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 196
Run training...


527it [00:16, 32.86it/s]


** End of epoch, accumulated average loss = 1.638227 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 197
Run training...


527it [00:15, 33.76it/s]


** End of epoch, accumulated average loss = 1.638049 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 198
Run training...


527it [00:15, 34.00it/s]


** End of epoch, accumulated average loss = 1.638457 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 199
Run training...


527it [00:15, 33.92it/s]


** End of epoch, accumulated average loss = 1.638233 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 200
Run training...


527it [00:16, 32.87it/s]


** End of epoch, accumulated average loss = 1.637749 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 201
Run training...


527it [00:15, 34.17it/s]


** End of epoch, accumulated average loss = 1.637557 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 202
Run training...


527it [00:15, 33.98it/s]


** End of epoch, accumulated average loss = 1.637954 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 203
Run training...


527it [00:15, 34.17it/s]


** End of epoch, accumulated average loss = 1.637701 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 204
Run training...


527it [00:16, 32.47it/s]


** End of epoch, accumulated average loss = 1.637668 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 205
Run training...


527it [00:15, 33.77it/s]


** End of epoch, accumulated average loss = 1.637274 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 206
Run training...


527it [00:15, 34.00it/s]


** End of epoch, accumulated average loss = 1.637150 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 207
Run training...


527it [00:15, 34.09it/s]


** End of epoch, accumulated average loss = 1.637081 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 208
Run training...


527it [00:16, 32.58it/s]


** End of epoch, accumulated average loss = 1.637007 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 209
Run training...


527it [00:15, 33.64it/s]


** End of epoch, accumulated average loss = 1.636642 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 210
Run training...


527it [00:15, 33.96it/s]


** End of epoch, accumulated average loss = 1.636787 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 211
Run training...


527it [00:15, 33.99it/s]


** End of epoch, accumulated average loss = 1.636541 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 212
Run training...


527it [00:16, 32.62it/s]


** End of epoch, accumulated average loss = 1.636768 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 213
Run training...


527it [00:15, 33.97it/s]


** End of epoch, accumulated average loss = 1.636440 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 214
Run training...


527it [00:15, 34.09it/s]


** End of epoch, accumulated average loss = 1.636403 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 215
Run training...


527it [00:15, 33.97it/s]


** End of epoch, accumulated average loss = 1.636448 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 216
Run training...


527it [00:16, 32.82it/s]


** End of epoch, accumulated average loss = 1.635951 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 217
Run training...


527it [00:15, 34.26it/s]


** End of epoch, accumulated average loss = 1.636169 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 218
Run training...


527it [00:15, 34.00it/s]


** End of epoch, accumulated average loss = 1.635573 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 219
Run training...


527it [00:15, 33.55it/s]


** End of epoch, accumulated average loss = 1.636140 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 220
Run training...


527it [00:16, 32.68it/s]


** End of epoch, accumulated average loss = 1.635819 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 221
Run training...


527it [00:15, 34.23it/s]


** End of epoch, accumulated average loss = 1.634901 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 222
Run training...


527it [00:15, 34.29it/s]


** End of epoch, accumulated average loss = 1.635295 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 223
Run training...


527it [00:15, 33.52it/s]


** End of epoch, accumulated average loss = 1.635344 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 224
Run training...


527it [00:16, 32.64it/s]


** End of epoch, accumulated average loss = 1.634577 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 225
Run training...


527it [00:15, 33.90it/s]


** End of epoch, accumulated average loss = 1.634534 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 226
Run training...


527it [00:15, 34.14it/s]


** End of epoch, accumulated average loss = 1.634952 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 227
Run training...


527it [00:15, 33.03it/s]


** End of epoch, accumulated average loss = 1.634834 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 228
Run training...


527it [00:15, 33.44it/s]


** End of epoch, accumulated average loss = 1.635042 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 229
Run training...


527it [00:15, 34.25it/s]


** End of epoch, accumulated average loss = 1.634342 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 230
Run training...


527it [00:15, 33.92it/s]


** End of epoch, accumulated average loss = 1.633951 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 231
Run training...


527it [00:16, 32.91it/s]


** End of epoch, accumulated average loss = 1.633949 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 232
Run training...


527it [00:15, 33.82it/s]


** End of epoch, accumulated average loss = 1.634299 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 233
Run training...


527it [00:15, 34.27it/s]


** End of epoch, accumulated average loss = 1.633804 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 234
Run training...


527it [00:15, 33.95it/s]


** End of epoch, accumulated average loss = 1.634097 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 235
Run training...


527it [00:16, 32.22it/s]


** End of epoch, accumulated average loss = 1.634144 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 236
Run training...


527it [00:15, 33.91it/s]


** End of epoch, accumulated average loss = 1.633746 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 237
Run training...


527it [00:15, 34.30it/s]


** End of epoch, accumulated average loss = 1.633493 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 238
Run training...


527it [00:15, 34.07it/s]


** End of epoch, accumulated average loss = 1.633541 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 239
Run training...


527it [00:16, 32.42it/s]


** End of epoch, accumulated average loss = 1.633271 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 240
Run training...


527it [00:15, 33.82it/s]


** End of epoch, accumulated average loss = 1.633179 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 241
Run training...


527it [00:15, 34.30it/s]


** End of epoch, accumulated average loss = 1.632868 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 242
Run training...


527it [00:15, 34.25it/s]


** End of epoch, accumulated average loss = 1.632975 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 243
Run training...


527it [00:16, 32.11it/s]


** End of epoch, accumulated average loss = 1.633141 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 244
Run training...


527it [00:15, 33.22it/s]


** End of epoch, accumulated average loss = 1.632993 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 245
Run training...


527it [00:15, 34.22it/s]


** End of epoch, accumulated average loss = 1.633515 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 246
Run training...


527it [00:15, 33.67it/s]


** End of epoch, accumulated average loss = 1.632857 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 247
Run training...


527it [00:16, 32.51it/s]


** End of epoch, accumulated average loss = 1.633030 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 248
Run training...


527it [00:15, 33.89it/s]


** End of epoch, accumulated average loss = 1.632243 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 249
Run training...


527it [00:15, 34.15it/s]


** End of epoch, accumulated average loss = 1.632200 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 250
Run training...


527it [00:16, 32.54it/s]


** End of epoch, accumulated average loss = 1.632095 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 251
Run training...


527it [00:16, 32.84it/s]


** End of epoch, accumulated average loss = 1.632246 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 252
Run training...


527it [00:15, 33.87it/s]


** End of epoch, accumulated average loss = 1.631886 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 253
Run training...


527it [00:15, 34.15it/s]


** End of epoch, accumulated average loss = 1.631961 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 254
Run training...


527it [00:16, 32.61it/s]


** End of epoch, accumulated average loss = 1.631732 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 255
Run training...


527it [00:15, 33.51it/s]


** End of epoch, accumulated average loss = 1.632050 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 256
Run training...


527it [00:15, 33.72it/s]


** End of epoch, accumulated average loss = 1.631447 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 257
Run training...


527it [00:15, 34.31it/s]


** End of epoch, accumulated average loss = 1.631851 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 258
Run training...


527it [00:16, 32.41it/s]


** End of epoch, accumulated average loss = 1.631766 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 259
Run training...


527it [00:15, 33.75it/s]


** End of epoch, accumulated average loss = 1.631540 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 260
Run training...


527it [00:15, 33.21it/s]


** End of epoch, accumulated average loss = 1.631375 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 261
Run training...


527it [00:15, 33.92it/s]


** End of epoch, accumulated average loss = 1.631567 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 262
Run training...


527it [00:16, 32.00it/s]


** End of epoch, accumulated average loss = 1.631743 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 263
Run training...


527it [00:15, 33.38it/s]


** End of epoch, accumulated average loss = 1.631272 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 264
Run training...


527it [00:15, 33.46it/s]


** End of epoch, accumulated average loss = 1.630581 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 265
Run training...


527it [00:16, 32.80it/s]


** End of epoch, accumulated average loss = 1.630523 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 266
Run training...


527it [00:16, 32.84it/s]


** End of epoch, accumulated average loss = 1.630859 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 267
Run training...


527it [00:15, 33.70it/s]


** End of epoch, accumulated average loss = 1.631001 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 268
Run training...


527it [00:15, 33.43it/s]


** End of epoch, accumulated average loss = 1.630862 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 269
Run training...


527it [00:16, 32.11it/s]


** End of epoch, accumulated average loss = 1.630570 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 270
Run training...


527it [00:15, 33.76it/s]


** End of epoch, accumulated average loss = 1.630232 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 271
Run training...


527it [00:15, 33.19it/s]


** End of epoch, accumulated average loss = 1.630513 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 272
Run training...


527it [00:15, 33.57it/s]


** End of epoch, accumulated average loss = 1.630486 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 273
Run training...


527it [00:16, 31.97it/s]


** End of epoch, accumulated average loss = 1.630610 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 274
Run training...


527it [00:15, 33.88it/s]


** End of epoch, accumulated average loss = 1.630046 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 275
Run training...


527it [00:15, 33.39it/s]


** End of epoch, accumulated average loss = 1.629877 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 276
Run training...


527it [00:16, 32.84it/s]


** End of epoch, accumulated average loss = 1.629698 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 277
Run training...


527it [00:16, 32.64it/s]


** End of epoch, accumulated average loss = 1.629897 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 278
Run training...


527it [00:15, 33.79it/s]


** End of epoch, accumulated average loss = 1.629819 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 279
Run training...


527it [00:15, 33.59it/s]


** End of epoch, accumulated average loss = 1.630026 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 280
Run training...


527it [00:16, 32.08it/s]


** End of epoch, accumulated average loss = 1.629573 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 281
Run training...


527it [00:15, 33.10it/s]


** End of epoch, accumulated average loss = 1.629235 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 282
Run training...


527it [00:15, 33.98it/s]


** End of epoch, accumulated average loss = 1.629266 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 283
Run training...


527it [00:15, 33.37it/s]


** End of epoch, accumulated average loss = 1.628910 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 284
Run training...


527it [00:16, 31.99it/s]


** End of epoch, accumulated average loss = 1.629333 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 285
Run training...


527it [00:15, 33.40it/s]


** End of epoch, accumulated average loss = 1.629370 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 286
Run training...


527it [00:15, 33.54it/s]


** End of epoch, accumulated average loss = 1.629335 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 287
Run training...


527it [00:16, 32.79it/s]


** End of epoch, accumulated average loss = 1.628471 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 288
Run training...


527it [00:16, 32.71it/s]


** End of epoch, accumulated average loss = 1.628992 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 289
Run training...


527it [00:15, 33.70it/s]


** End of epoch, accumulated average loss = 1.628971 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 290
Run training...


527it [00:15, 33.73it/s]


** End of epoch, accumulated average loss = 1.628611 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 291
Run training...


527it [00:16, 32.23it/s]


** End of epoch, accumulated average loss = 1.628364 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 292
Run training...


527it [00:15, 33.48it/s]


** End of epoch, accumulated average loss = 1.628545 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 293
Run training...


527it [00:15, 33.96it/s]


** End of epoch, accumulated average loss = 1.628385 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 294
Run training...


527it [00:15, 33.46it/s]


** End of epoch, accumulated average loss = 1.628267 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 295
Run training...


527it [00:16, 32.30it/s]


** End of epoch, accumulated average loss = 1.628243 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 296
Run training...


527it [00:15, 33.59it/s]


** End of epoch, accumulated average loss = 1.628079 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 297
Run training...


527it [00:15, 33.51it/s]


** End of epoch, accumulated average loss = 1.628081 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 298
Run training...


527it [00:15, 33.34it/s]


** End of epoch, accumulated average loss = 1.628780 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 299
Run training...


527it [00:16, 32.37it/s]


** End of epoch, accumulated average loss = 1.627528 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 300
Run training...


527it [00:15, 33.64it/s]


** End of epoch, accumulated average loss = 1.627723 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 301
Run training...


527it [00:15, 33.60it/s]


** End of epoch, accumulated average loss = 1.628098 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 302
Run training...


527it [00:16, 32.58it/s]


** End of epoch, accumulated average loss = 1.628154 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 303
Run training...


527it [00:15, 33.43it/s]


** End of epoch, accumulated average loss = 1.627661 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 304
Run training...


527it [00:15, 33.61it/s]


** End of epoch, accumulated average loss = 1.627597 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 305
Run training...


527it [00:15, 33.94it/s]


** End of epoch, accumulated average loss = 1.627206 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 306
Run training...


527it [00:16, 31.70it/s]


** End of epoch, accumulated average loss = 1.627774 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 307
Run training...


527it [00:15, 33.74it/s]


** End of epoch, accumulated average loss = 1.627138 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 308
Run training...


527it [00:15, 33.34it/s]


** End of epoch, accumulated average loss = 1.627092 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 309
Run training...


527it [00:15, 33.67it/s]


** End of epoch, accumulated average loss = 1.627492 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 310
Run training...


527it [00:16, 32.33it/s]


** End of epoch, accumulated average loss = 1.626879 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 311
Run training...


527it [00:15, 33.89it/s]


** End of epoch, accumulated average loss = 1.627433 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 312
Run training...


527it [00:15, 33.33it/s]


** End of epoch, accumulated average loss = 1.626761 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 313
Run training...


527it [00:16, 32.71it/s]


** End of epoch, accumulated average loss = 1.626830 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 314
Run training...


527it [00:16, 32.56it/s]


** End of epoch, accumulated average loss = 1.627219 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 315
Run training...


527it [00:15, 33.79it/s]


** End of epoch, accumulated average loss = 1.627157 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 316
Run training...


527it [00:15, 33.44it/s]


** End of epoch, accumulated average loss = 1.626683 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 317
Run training...


527it [00:16, 32.00it/s]


** End of epoch, accumulated average loss = 1.626629 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 318
Run training...


527it [00:15, 33.56it/s]


** End of epoch, accumulated average loss = 1.626174 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 319
Run training...


527it [00:15, 34.09it/s]


** End of epoch, accumulated average loss = 1.626526 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 320
Run training...


527it [00:15, 34.03it/s]


** End of epoch, accumulated average loss = 1.626172 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 321
Run training...


527it [00:16, 32.30it/s]


** End of epoch, accumulated average loss = 1.626319 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 322
Run training...


527it [00:15, 33.91it/s]


** End of epoch, accumulated average loss = 1.626261 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 323
Run training...


527it [00:15, 33.94it/s]


** End of epoch, accumulated average loss = 1.625604 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 324
Run training...


527it [00:15, 33.54it/s]


** End of epoch, accumulated average loss = 1.626360 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 325
Run training...


527it [00:16, 32.03it/s]


** End of epoch, accumulated average loss = 1.625631 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 326
Run training...


527it [00:15, 33.80it/s]


** End of epoch, accumulated average loss = 1.625611 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 327
Run training...


527it [00:15, 33.88it/s]


** End of epoch, accumulated average loss = 1.626354 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 328
Run training...


527it [00:15, 33.70it/s]


** End of epoch, accumulated average loss = 1.625578 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 329
Run training...


527it [00:16, 32.20it/s]


** End of epoch, accumulated average loss = 1.625915 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 330
Run training...


527it [00:15, 33.80it/s]


** End of epoch, accumulated average loss = 1.625943 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 331
Run training...


527it [00:16, 32.40it/s]


** End of epoch, accumulated average loss = 1.625790 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 332
Run training...


527it [00:16, 32.62it/s]


** End of epoch, accumulated average loss = 1.625426 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 333
Run training...


527it [00:15, 33.26it/s]


** End of epoch, accumulated average loss = 1.625311 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 334
Run training...


527it [00:15, 33.87it/s]


** End of epoch, accumulated average loss = 1.625148 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 335
Run training...


527it [00:15, 33.65it/s]


** End of epoch, accumulated average loss = 1.625146 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 336
Run training...


527it [00:16, 32.13it/s]


** End of epoch, accumulated average loss = 1.625047 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 337
Run training...


527it [00:15, 33.71it/s]


** End of epoch, accumulated average loss = 1.624984 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 338
Run training...


527it [00:15, 33.74it/s]


** End of epoch, accumulated average loss = 1.625111 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 339
Run training...


527it [00:15, 33.88it/s]


** End of epoch, accumulated average loss = 1.624931 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 340
Run training...


527it [00:16, 32.47it/s]


** End of epoch, accumulated average loss = 1.625021 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 341
Run training...


527it [00:15, 33.94it/s]


** End of epoch, accumulated average loss = 1.624937 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 342
Run training...


527it [00:15, 33.22it/s]


** End of epoch, accumulated average loss = 1.624548 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 343
Run training...


527it [00:15, 33.44it/s]


** End of epoch, accumulated average loss = 1.624719 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 344
Run training...


527it [00:16, 32.36it/s]


** End of epoch, accumulated average loss = 1.624934 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 345
Run training...


527it [00:15, 33.68it/s]


** End of epoch, accumulated average loss = 1.624992 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 346
Run training...


527it [00:15, 33.26it/s]


** End of epoch, accumulated average loss = 1.624273 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 347
Run training...


527it [00:16, 32.23it/s]


** End of epoch, accumulated average loss = 1.624342 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 348
Run training...


527it [00:15, 33.44it/s]


** End of epoch, accumulated average loss = 1.624409 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 349
Run training...


527it [00:15, 33.95it/s]


** End of epoch, accumulated average loss = 1.624266 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 350
Run training...


527it [00:15, 33.64it/s]


** End of epoch, accumulated average loss = 1.624613 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 351
Run training...


527it [00:16, 31.96it/s]


** End of epoch, accumulated average loss = 1.624136 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 352
Run training...


527it [00:15, 33.86it/s]


** End of epoch, accumulated average loss = 1.624172 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 353
Run training...


527it [00:15, 33.79it/s]


** End of epoch, accumulated average loss = 1.624063 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 354
Run training...


527it [00:15, 33.48it/s]


** End of epoch, accumulated average loss = 1.624277 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 355
Run training...


527it [00:16, 32.09it/s]


** End of epoch, accumulated average loss = 1.623999 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 356
Run training...


527it [00:15, 34.16it/s]


** End of epoch, accumulated average loss = 1.623884 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 357
Run training...


527it [00:15, 33.58it/s]


** End of epoch, accumulated average loss = 1.623630 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 358
Run training...


527it [00:16, 32.89it/s]


** End of epoch, accumulated average loss = 1.623549 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 359
Run training...


527it [00:16, 32.82it/s]


** End of epoch, accumulated average loss = 1.623770 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 360
Run training...


527it [00:15, 33.92it/s]


** End of epoch, accumulated average loss = 1.623652 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 361
Run training...


527it [00:15, 33.70it/s]


** End of epoch, accumulated average loss = 1.623385 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 362
Run training...


527it [00:16, 32.11it/s]


** End of epoch, accumulated average loss = 1.623137 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 363
Run training...


527it [00:15, 33.20it/s]


** End of epoch, accumulated average loss = 1.623390 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 364
Run training...


527it [00:15, 33.85it/s]


** End of epoch, accumulated average loss = 1.622984 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 365
Run training...


527it [00:15, 33.63it/s]


** End of epoch, accumulated average loss = 1.623534 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 366
Run training...


527it [00:16, 32.15it/s]


** End of epoch, accumulated average loss = 1.623114 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 367
Run training...


527it [00:15, 33.69it/s]


** End of epoch, accumulated average loss = 1.623295 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 368
Run training...


527it [00:15, 33.87it/s]


** End of epoch, accumulated average loss = 1.623301 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 369
Run training...


527it [00:15, 33.60it/s]


** End of epoch, accumulated average loss = 1.623007 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 370
Run training...


527it [00:16, 32.05it/s]


** End of epoch, accumulated average loss = 1.622673 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 371
Run training...


527it [00:15, 33.94it/s]


** End of epoch, accumulated average loss = 1.622688 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 372
Run training...


527it [00:15, 33.80it/s]


** End of epoch, accumulated average loss = 1.622492 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 373
Run training...


527it [00:16, 32.88it/s]


** End of epoch, accumulated average loss = 1.622729 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 374
Run training...


527it [00:16, 32.59it/s]


** End of epoch, accumulated average loss = 1.622616 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 375
Run training...


527it [00:15, 34.08it/s]


** End of epoch, accumulated average loss = 1.622495 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 376
Run training...


527it [00:15, 33.59it/s]


** End of epoch, accumulated average loss = 1.622770 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 377
Run training...


527it [00:16, 32.58it/s]


** End of epoch, accumulated average loss = 1.622224 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 378
Run training...


527it [00:16, 32.77it/s]


** End of epoch, accumulated average loss = 1.622338 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 379
Run training...


527it [00:15, 33.73it/s]


** End of epoch, accumulated average loss = 1.622720 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 380
Run training...


527it [00:15, 33.72it/s]


** End of epoch, accumulated average loss = 1.622299 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 381
Run training...


527it [00:16, 31.78it/s]


** End of epoch, accumulated average loss = 1.622657 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 382
Run training...


527it [00:15, 33.83it/s]


** End of epoch, accumulated average loss = 1.622823 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 383
Run training...


527it [00:15, 33.42it/s]


** End of epoch, accumulated average loss = 1.622000 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 384
Run training...


527it [00:15, 33.49it/s]


** End of epoch, accumulated average loss = 1.621816 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 385
Run training...


527it [00:16, 31.98it/s]


** End of epoch, accumulated average loss = 1.622151 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 386
Run training...


527it [00:15, 33.77it/s]


** End of epoch, accumulated average loss = 1.622067 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 387
Run training...


527it [00:15, 33.48it/s]


** End of epoch, accumulated average loss = 1.621732 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 388
Run training...


527it [00:16, 32.86it/s]


** End of epoch, accumulated average loss = 1.622071 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 389
Run training...


527it [00:16, 32.84it/s]


** End of epoch, accumulated average loss = 1.621965 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 390
Run training...


527it [00:15, 33.96it/s]


** End of epoch, accumulated average loss = 1.621269 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 391
Run training...


527it [00:15, 33.80it/s]


** End of epoch, accumulated average loss = 1.621921 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 392
Run training...


527it [00:16, 32.23it/s]


** End of epoch, accumulated average loss = 1.621753 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 393
Run training...


527it [00:15, 33.54it/s]


** End of epoch, accumulated average loss = 1.621381 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 394
Run training...


527it [00:15, 34.22it/s]


** End of epoch, accumulated average loss = 1.621002 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 395
Run training...


527it [00:15, 34.30it/s]


** End of epoch, accumulated average loss = 1.621003 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 396
Run training...


527it [00:16, 32.79it/s]


** End of epoch, accumulated average loss = 1.621334 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 397
Run training...


527it [00:15, 34.07it/s]


** End of epoch, accumulated average loss = 1.621349 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 398
Run training...


527it [00:15, 34.04it/s]


** End of epoch, accumulated average loss = 1.621064 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 399
Run training...


527it [00:15, 34.26it/s]


** End of epoch, accumulated average loss = 1.621611 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 400
Run training...


527it [00:16, 32.48it/s]


** End of epoch, accumulated average loss = 1.620602 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 401
Run training...


527it [00:15, 33.79it/s]


** End of epoch, accumulated average loss = 1.621008 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 402
Run training...


527it [00:15, 34.09it/s]


** End of epoch, accumulated average loss = 1.620827 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 403
Run training...


527it [00:15, 33.89it/s]


** End of epoch, accumulated average loss = 1.620912 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 404
Run training...


527it [00:16, 32.32it/s]


** End of epoch, accumulated average loss = 1.621025 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 405
Run training...


527it [00:15, 34.25it/s]


** End of epoch, accumulated average loss = 1.620381 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 406
Run training...


527it [00:15, 34.37it/s]


** End of epoch, accumulated average loss = 1.620322 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 407
Run training...


527it [00:15, 34.34it/s]


** End of epoch, accumulated average loss = 1.620671 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 408
Run training...


527it [00:16, 32.44it/s]


** End of epoch, accumulated average loss = 1.620504 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 409
Run training...


527it [00:15, 34.03it/s]


** End of epoch, accumulated average loss = 1.620775 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 410
Run training...


527it [00:15, 34.18it/s]


** End of epoch, accumulated average loss = 1.620380 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 411
Run training...


527it [00:15, 34.11it/s]


** End of epoch, accumulated average loss = 1.620268 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 412
Run training...


527it [00:15, 32.97it/s]


** End of epoch, accumulated average loss = 1.620653 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 413
Run training...


527it [00:15, 33.93it/s]


** End of epoch, accumulated average loss = 1.620236 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 414
Run training...


527it [00:15, 33.98it/s]


** End of epoch, accumulated average loss = 1.620593 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 415
Run training...


527it [00:15, 34.19it/s]


** End of epoch, accumulated average loss = 1.620361 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 416
Run training...


527it [00:16, 32.68it/s]


** End of epoch, accumulated average loss = 1.620478 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 417
Run training...


527it [00:15, 34.13it/s]


** End of epoch, accumulated average loss = 1.620006 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 418
Run training...


527it [00:15, 34.09it/s]


** End of epoch, accumulated average loss = 1.620253 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 419
Run training...


527it [00:15, 33.71it/s]


** End of epoch, accumulated average loss = 1.619910 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 420
Run training...


527it [00:16, 32.41it/s]


** End of epoch, accumulated average loss = 1.619986 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 421
Run training...


527it [00:15, 33.80it/s]


** End of epoch, accumulated average loss = 1.619856 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 422
Run training...


527it [00:15, 34.16it/s]


** End of epoch, accumulated average loss = 1.619924 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 423
Run training...


527it [00:15, 33.91it/s]


** End of epoch, accumulated average loss = 1.619645 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 424
Run training...


527it [00:16, 32.05it/s]


** End of epoch, accumulated average loss = 1.619800 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 425
Run training...


527it [00:15, 34.26it/s]


** End of epoch, accumulated average loss = 1.619355 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 426
Run training...


527it [00:15, 34.34it/s]


** End of epoch, accumulated average loss = 1.619428 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 427
Run training...


527it [00:15, 34.16it/s]


** End of epoch, accumulated average loss = 1.619687 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 428
Run training...


527it [00:16, 32.17it/s]


** End of epoch, accumulated average loss = 1.620153 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 429
Run training...


527it [00:15, 34.33it/s]


** End of epoch, accumulated average loss = 1.619357 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 430
Run training...


527it [00:15, 34.15it/s]


** End of epoch, accumulated average loss = 1.619301 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 431
Run training...


527it [00:15, 34.32it/s]


** End of epoch, accumulated average loss = 1.619084 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 432
Run training...


527it [00:16, 32.29it/s]


** End of epoch, accumulated average loss = 1.618989 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 433
Run training...


527it [00:15, 34.09it/s]


** End of epoch, accumulated average loss = 1.619331 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 434
Run training...


527it [00:15, 34.00it/s]


** End of epoch, accumulated average loss = 1.619397 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 435
Run training...


527it [00:15, 33.95it/s]


** End of epoch, accumulated average loss = 1.619243 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 436
Run training...


527it [00:16, 32.37it/s]


** End of epoch, accumulated average loss = 1.619498 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 437
Run training...


527it [00:15, 34.10it/s]


** End of epoch, accumulated average loss = 1.618875 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 438
Run training...


527it [00:15, 34.15it/s]


** End of epoch, accumulated average loss = 1.618826 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 439
Run training...


527it [00:15, 33.84it/s]


** End of epoch, accumulated average loss = 1.618799 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 440
Run training...


527it [00:16, 31.92it/s]


** End of epoch, accumulated average loss = 1.618598 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 441
Run training...


527it [00:15, 33.95it/s]


** End of epoch, accumulated average loss = 1.619167 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 442
Run training...


527it [00:15, 34.46it/s]


** End of epoch, accumulated average loss = 1.618967 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 443
Run training...


527it [00:15, 34.41it/s]


** End of epoch, accumulated average loss = 1.618700 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 444
Run training...


527it [00:16, 32.35it/s]


** End of epoch, accumulated average loss = 1.619198 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 445
Run training...


527it [00:15, 33.89it/s]


** End of epoch, accumulated average loss = 1.618772 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 446
Run training...


527it [00:15, 34.13it/s]


** End of epoch, accumulated average loss = 1.619008 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 447
Run training...


527it [00:15, 34.06it/s]


** End of epoch, accumulated average loss = 1.618562 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 448
Run training...


527it [00:16, 32.39it/s]


** End of epoch, accumulated average loss = 1.617881 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 449
Run training...


527it [00:15, 33.63it/s]


** End of epoch, accumulated average loss = 1.617860 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 450
Run training...


527it [00:15, 33.85it/s]


** End of epoch, accumulated average loss = 1.618140 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 451
Run training...


527it [00:15, 33.53it/s]


** End of epoch, accumulated average loss = 1.618242 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 452
Run training...


527it [00:16, 32.45it/s]


** End of epoch, accumulated average loss = 1.618566 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 453
Run training...


527it [00:15, 33.04it/s]


** End of epoch, accumulated average loss = 1.617799 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 454
Run training...


527it [00:15, 34.23it/s]


** End of epoch, accumulated average loss = 1.617904 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 455
Run training...


527it [00:16, 32.60it/s]


** End of epoch, accumulated average loss = 1.617924 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 456
Run training...


527it [00:15, 33.14it/s]


** End of epoch, accumulated average loss = 1.618066 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 457
Run training...


527it [00:15, 33.40it/s]


** End of epoch, accumulated average loss = 1.618206 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 458
Run training...


527it [00:15, 33.71it/s]


** End of epoch, accumulated average loss = 1.617623 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 459
Run training...


527it [00:16, 32.34it/s]


** End of epoch, accumulated average loss = 1.617922 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 460
Run training...


527it [00:15, 33.02it/s]


** End of epoch, accumulated average loss = 1.617877 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 461
Run training...


527it [00:15, 33.70it/s]


** End of epoch, accumulated average loss = 1.617431 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 462
Run training...


527it [00:17, 29.29it/s]


** End of epoch, accumulated average loss = 1.618105 **
** Elapsed time: 0:00:18**

----------------------------------------
Epoch: 463
Run training...


527it [00:17, 30.12it/s]


** End of epoch, accumulated average loss = 1.617576 **
** Elapsed time: 0:00:18**

----------------------------------------
Epoch: 464
Run training...


527it [00:15, 33.43it/s]


** End of epoch, accumulated average loss = 1.617289 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 465
Run training...


527it [00:15, 33.30it/s]


** End of epoch, accumulated average loss = 1.617592 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 466
Run training...


527it [00:16, 32.26it/s]


** End of epoch, accumulated average loss = 1.617371 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 467
Run training...


527it [00:15, 34.25it/s]


** End of epoch, accumulated average loss = 1.617464 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 468
Run training...


527it [00:15, 33.44it/s]


** End of epoch, accumulated average loss = 1.617491 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 469
Run training...


527it [00:16, 32.82it/s]


** End of epoch, accumulated average loss = 1.616793 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 470
Run training...


527it [00:16, 32.78it/s]


** End of epoch, accumulated average loss = 1.616987 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 471
Run training...


527it [00:15, 34.23it/s]


** End of epoch, accumulated average loss = 1.616956 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 472
Run training...


527it [00:15, 33.77it/s]


** End of epoch, accumulated average loss = 1.616550 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 473
Run training...


527it [00:16, 32.87it/s]


** End of epoch, accumulated average loss = 1.617170 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 474
Run training...


527it [00:16, 32.92it/s]


** End of epoch, accumulated average loss = 1.617349 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 475
Run training...


527it [00:15, 34.01it/s]


** End of epoch, accumulated average loss = 1.617003 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 476
Run training...


527it [00:15, 33.52it/s]


** End of epoch, accumulated average loss = 1.616699 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 477
Run training...


527it [00:16, 32.21it/s]


** End of epoch, accumulated average loss = 1.616733 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 478
Run training...


527it [00:15, 33.60it/s]


** End of epoch, accumulated average loss = 1.616794 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 479
Run training...


527it [00:15, 33.67it/s]


** End of epoch, accumulated average loss = 1.617065 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 480
Run training...


527it [00:15, 33.35it/s]


** End of epoch, accumulated average loss = 1.616020 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 481
Run training...


527it [00:16, 31.97it/s]


** End of epoch, accumulated average loss = 1.616486 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 482
Run training...


527it [00:15, 33.85it/s]


** End of epoch, accumulated average loss = 1.616269 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 483
Run training...


527it [00:15, 33.92it/s]


** End of epoch, accumulated average loss = 1.616454 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 484
Run training...


527it [00:15, 33.83it/s]


** End of epoch, accumulated average loss = 1.616597 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 485
Run training...


527it [00:16, 31.76it/s]


** End of epoch, accumulated average loss = 1.616167 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 486
Run training...


527it [00:15, 33.78it/s]


** End of epoch, accumulated average loss = 1.615672 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 487
Run training...


527it [00:15, 33.52it/s]


** End of epoch, accumulated average loss = 1.616208 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 488
Run training...


527it [00:15, 33.95it/s]


** End of epoch, accumulated average loss = 1.616140 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 489
Run training...


527it [00:16, 32.12it/s]


** End of epoch, accumulated average loss = 1.616335 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 490
Run training...


527it [00:15, 33.72it/s]


** End of epoch, accumulated average loss = 1.616221 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 491
Run training...


527it [00:15, 33.36it/s]


** End of epoch, accumulated average loss = 1.616216 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 492
Run training...


527it [00:15, 32.96it/s]


** End of epoch, accumulated average loss = 1.616103 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 493
Run training...


527it [00:16, 32.83it/s]


** End of epoch, accumulated average loss = 1.615863 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 494
Run training...


527it [00:15, 33.66it/s]


** End of epoch, accumulated average loss = 1.616271 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 495
Run training...


527it [00:15, 33.55it/s]


** End of epoch, accumulated average loss = 1.616214 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 496
Run training...


527it [00:16, 31.87it/s]


** End of epoch, accumulated average loss = 1.616071 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 497
Run training...


527it [00:15, 33.66it/s]


** End of epoch, accumulated average loss = 1.615891 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 498
Run training...


527it [00:15, 33.70it/s]


** End of epoch, accumulated average loss = 1.615704 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 499
Run training...


527it [00:15, 33.14it/s]


** End of epoch, accumulated average loss = 1.615752 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 500
Run training...


527it [00:16, 32.38it/s]


** End of epoch, accumulated average loss = 1.615883 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 501
Run training...


527it [00:15, 33.47it/s]


** End of epoch, accumulated average loss = 1.615449 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 502
Run training...


527it [00:15, 33.47it/s]


** End of epoch, accumulated average loss = 1.615633 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 503
Run training...


527it [00:15, 33.35it/s]


** End of epoch, accumulated average loss = 1.615288 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 504
Run training...


527it [00:16, 32.55it/s]


** End of epoch, accumulated average loss = 1.615547 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 505
Run training...


527it [00:15, 33.52it/s]


** End of epoch, accumulated average loss = 1.615745 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 506
Run training...


527it [00:15, 33.35it/s]


** End of epoch, accumulated average loss = 1.615182 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 507
Run training...


527it [00:16, 32.73it/s]


** End of epoch, accumulated average loss = 1.615529 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 508
Run training...


527it [00:15, 33.20it/s]


** End of epoch, accumulated average loss = 1.615539 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 509
Run training...


527it [00:15, 33.65it/s]


** End of epoch, accumulated average loss = 1.614475 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 510
Run training...


527it [00:15, 33.22it/s]


** End of epoch, accumulated average loss = 1.614783 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 511
Run training...


527it [00:16, 31.67it/s]


** End of epoch, accumulated average loss = 1.615045 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 512
Run training...


527it [00:15, 34.05it/s]


** End of epoch, accumulated average loss = 1.615196 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 513
Run training...


527it [00:15, 33.77it/s]


** End of epoch, accumulated average loss = 1.615235 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 514
Run training...


527it [00:15, 33.39it/s]


** End of epoch, accumulated average loss = 1.615171 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 515
Run training...


527it [00:16, 32.01it/s]


** End of epoch, accumulated average loss = 1.614655 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 516
Run training...


527it [00:15, 33.72it/s]


** End of epoch, accumulated average loss = 1.614989 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 517
Run training...


527it [00:15, 33.52it/s]


** End of epoch, accumulated average loss = 1.615225 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 518
Run training...


527it [00:15, 32.96it/s]


** End of epoch, accumulated average loss = 1.615190 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 519
Run training...


527it [00:16, 32.30it/s]


** End of epoch, accumulated average loss = 1.614990 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 520
Run training...


527it [00:15, 33.89it/s]


** End of epoch, accumulated average loss = 1.614543 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 521
Run training...


527it [00:15, 33.38it/s]


** End of epoch, accumulated average loss = 1.614699 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 522
Run training...


527it [00:16, 32.43it/s]


** End of epoch, accumulated average loss = 1.614919 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 523
Run training...


527it [00:15, 33.42it/s]


** End of epoch, accumulated average loss = 1.615250 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 524
Run training...


527it [00:15, 33.84it/s]


** End of epoch, accumulated average loss = 1.614512 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 525
Run training...


527it [00:15, 33.58it/s]


** End of epoch, accumulated average loss = 1.615081 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 526
Run training...


527it [00:16, 31.71it/s]


** End of epoch, accumulated average loss = 1.614034 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 527
Run training...


527it [00:15, 33.63it/s]


** End of epoch, accumulated average loss = 1.614342 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 528
Run training...


527it [00:15, 33.88it/s]


** End of epoch, accumulated average loss = 1.614520 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 529
Run training...


527it [00:15, 33.65it/s]


** End of epoch, accumulated average loss = 1.614536 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 530
Run training...


527it [00:16, 31.60it/s]


** End of epoch, accumulated average loss = 1.614532 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 531
Run training...


527it [00:15, 33.43it/s]


** End of epoch, accumulated average loss = 1.614365 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 532
Run training...


527it [00:15, 33.81it/s]


** End of epoch, accumulated average loss = 1.614174 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 533
Run training...


527it [00:15, 33.22it/s]


** End of epoch, accumulated average loss = 1.614544 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 534
Run training...


527it [00:16, 32.47it/s]


** End of epoch, accumulated average loss = 1.613865 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 535
Run training...


527it [00:15, 33.54it/s]


** End of epoch, accumulated average loss = 1.613846 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 536
Run training...


527it [00:15, 33.69it/s]


** End of epoch, accumulated average loss = 1.614158 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 537
Run training...


527it [00:16, 32.39it/s]


** End of epoch, accumulated average loss = 1.614265 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 538
Run training...


527it [00:15, 32.99it/s]


** End of epoch, accumulated average loss = 1.614183 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 539
Run training...


527it [00:15, 33.60it/s]


** End of epoch, accumulated average loss = 1.613620 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 540
Run training...


527it [00:15, 33.57it/s]


** End of epoch, accumulated average loss = 1.614072 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 541
Run training...


527it [00:16, 32.39it/s]


** End of epoch, accumulated average loss = 1.613765 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 542
Run training...


527it [00:15, 33.79it/s]


** End of epoch, accumulated average loss = 1.613616 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 543
Run training...


527it [00:15, 33.58it/s]


** End of epoch, accumulated average loss = 1.613292 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 544
Run training...


527it [00:15, 33.87it/s]


** End of epoch, accumulated average loss = 1.614175 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 545
Run training...


527it [00:16, 32.22it/s]


** End of epoch, accumulated average loss = 1.613569 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 546
Run training...


527it [00:15, 33.49it/s]


** End of epoch, accumulated average loss = 1.613651 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 547
Run training...


527it [00:15, 33.60it/s]


** End of epoch, accumulated average loss = 1.613119 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 548
Run training...


527it [00:15, 33.48it/s]


** End of epoch, accumulated average loss = 1.613372 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 549
Run training...


527it [00:16, 32.29it/s]


** End of epoch, accumulated average loss = 1.613239 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 550
Run training...


527it [00:15, 33.72it/s]


** End of epoch, accumulated average loss = 1.613421 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 551
Run training...


527it [00:15, 33.21it/s]


** End of epoch, accumulated average loss = 1.613819 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 552
Run training...


527it [00:16, 32.64it/s]


** End of epoch, accumulated average loss = 1.613455 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 553
Run training...


527it [00:15, 33.26it/s]


** End of epoch, accumulated average loss = 1.613021 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 554
Run training...


527it [00:15, 33.97it/s]


** End of epoch, accumulated average loss = 1.613397 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 555
Run training...


527it [00:15, 33.66it/s]


** End of epoch, accumulated average loss = 1.613277 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 556
Run training...


527it [00:16, 32.43it/s]


** End of epoch, accumulated average loss = 1.612993 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 557
Run training...


527it [00:15, 33.01it/s]


** End of epoch, accumulated average loss = 1.613489 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 558
Run training...


527it [00:15, 33.79it/s]


** End of epoch, accumulated average loss = 1.613376 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 559
Run training...


527it [00:15, 33.68it/s]


** End of epoch, accumulated average loss = 1.613282 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 560
Run training...


527it [00:16, 32.29it/s]


** End of epoch, accumulated average loss = 1.612837 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 561
Run training...


527it [00:15, 33.91it/s]


** End of epoch, accumulated average loss = 1.613157 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 562
Run training...


527it [00:15, 33.46it/s]


** End of epoch, accumulated average loss = 1.612551 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 563
Run training...


527it [00:15, 33.79it/s]


** End of epoch, accumulated average loss = 1.613036 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 564
Run training...


527it [00:16, 31.97it/s]


** End of epoch, accumulated average loss = 1.613141 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 565
Run training...


527it [00:15, 34.08it/s]


** End of epoch, accumulated average loss = 1.613110 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 566
Run training...


527it [00:15, 33.51it/s]


** End of epoch, accumulated average loss = 1.612425 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 567
Run training...


527it [00:15, 33.41it/s]


** End of epoch, accumulated average loss = 1.612564 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 568
Run training...


527it [00:16, 32.09it/s]


** End of epoch, accumulated average loss = 1.612387 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 569
Run training...


527it [00:15, 34.26it/s]


** End of epoch, accumulated average loss = 1.612718 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 570
Run training...


527it [00:15, 33.70it/s]


** End of epoch, accumulated average loss = 1.612878 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 571
Run training...


527it [00:15, 33.44it/s]


** End of epoch, accumulated average loss = 1.612957 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 572
Run training...


527it [00:16, 31.84it/s]


** End of epoch, accumulated average loss = 1.612512 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 573
Run training...


527it [00:15, 34.24it/s]


** End of epoch, accumulated average loss = 1.612822 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 574
Run training...


527it [00:15, 33.41it/s]


** End of epoch, accumulated average loss = 1.612773 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 575
Run training...


527it [00:15, 33.54it/s]


** End of epoch, accumulated average loss = 1.612901 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 576
Run training...


527it [00:16, 32.30it/s]


** End of epoch, accumulated average loss = 1.612151 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 577
Run training...


527it [00:15, 33.52it/s]


** End of epoch, accumulated average loss = 1.612596 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 578
Run training...


527it [00:15, 33.41it/s]


** End of epoch, accumulated average loss = 1.612541 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 579
Run training...


527it [00:16, 32.76it/s]


** End of epoch, accumulated average loss = 1.612134 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 580
Run training...


527it [00:15, 33.33it/s]


** End of epoch, accumulated average loss = 1.612264 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 581
Run training...


527it [00:15, 33.57it/s]


** End of epoch, accumulated average loss = 1.612567 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 582
Run training...


527it [00:15, 33.73it/s]


** End of epoch, accumulated average loss = 1.612033 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 583
Run training...


527it [00:16, 32.17it/s]


** End of epoch, accumulated average loss = 1.612272 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 584
Run training...


527it [00:15, 34.14it/s]


** End of epoch, accumulated average loss = 1.612375 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 585
Run training...


527it [00:15, 34.03it/s]


** End of epoch, accumulated average loss = 1.612052 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 586
Run training...


527it [00:15, 33.93it/s]


** End of epoch, accumulated average loss = 1.612083 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 587
Run training...


527it [00:16, 31.95it/s]


** End of epoch, accumulated average loss = 1.612184 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 588
Run training...


527it [00:15, 33.97it/s]


** End of epoch, accumulated average loss = 1.611742 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 589
Run training...


527it [00:15, 33.60it/s]


** End of epoch, accumulated average loss = 1.612358 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 590
Run training...


527it [00:15, 34.09it/s]


** End of epoch, accumulated average loss = 1.612182 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 591
Run training...


527it [00:16, 31.79it/s]


** End of epoch, accumulated average loss = 1.611626 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 592
Run training...


527it [00:15, 33.83it/s]


** End of epoch, accumulated average loss = 1.612355 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 593
Run training...


527it [00:15, 33.52it/s]


** End of epoch, accumulated average loss = 1.611583 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 594
Run training...


527it [00:15, 34.05it/s]


** End of epoch, accumulated average loss = 1.612064 **
** Elapsed time: 0:00:15**

----------------------------------------
Epoch: 595
Run training...


527it [00:16, 31.89it/s]


** End of epoch, accumulated average loss = 1.611885 **
** Elapsed time: 0:00:17**

----------------------------------------
Epoch: 596
Run training...


527it [00:15, 33.77it/s]


** End of epoch, accumulated average loss = 1.611948 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 597
Run training...


527it [00:15, 33.28it/s]


** End of epoch, accumulated average loss = 1.611949 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 598
Run training...


527it [00:15, 33.39it/s]


** End of epoch, accumulated average loss = 1.611562 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 599
Run training...


527it [00:16, 32.47it/s]


** End of epoch, accumulated average loss = 1.612486 **
** Elapsed time: 0:00:16**

----------------------------------------
Epoch: 600
Run training...


527it [00:15, 33.83it/s]

** End of epoch, accumulated average loss = 1.611773 **
** Elapsed time: 0:00:16**
Classifier trained in 9521.58s





In [None]:
allpreds = []
for part, (ids, x, y) in part2ixy.items():
    print('\nClassifying %s set with %d examples ...' % (part, len(x)))
    st = time.time()
    preds = classify(x, params)
    print('%s set classified in %.2fs' % (part, time.time() - st))
    count_of_values = list(map(len, preds))
    assert np.all(np.array(count_of_values) == top_k)
    #score(preds, y)
    allpreds.extend(zip(ids, preds))

save_preds(allpreds, preds_fname=PREDS_FNAME)
print('\nChecking saved predictions ...')
score_preds(preds_path=PREDS_FNAME, data_dir=TRANSLIT_PATH, parts=SCORED_PARTS)


Classifying train set with 105371 examples ...
Using GPU device: cuda


100%|██████████| 527/527 [01:28<00:00,  5.94it/s]


train set classified in 88.75s

Classifying dev set with 26342 examples ...
Using GPU device: cuda


100%|██████████| 132/132 [00:22<00:00,  5.85it/s]


dev set classified in 22.57s

Classifying train_small set with 2000 examples ...
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.41it/s]


train_small set classified in 1.57s

Classifying dev_small set with 2000 examples ...
Using GPU device: cuda


100%|██████████| 10/10 [00:01<00:00,  6.35it/s]


dev_small set classified in 1.58s

Classifying test set with 32926 examples ...
Using GPU device: cuda


100%|██████████| 165/165 [00:27<00:00,  5.91it/s]


test set classified in 27.94s
Predictions saved to preds_translit.tsv

Checking saved predictions ...
train set accuracy@1: 0.70
dev set accuracy@1: 0.66
train_small set accuracy@1: 0.70
dev_small set accuracy@1: 0.67
no labels for test set


{'train': {'acc@1': 0.7015402719913448},
 'dev': {'acc@1': 0.6581504821198086},
 'train_small': {'acc@1': 0.696},
 'dev_small': {'acc@1': 0.6695}}

In [None]:
!zip preds_translit.zip preds_translit.tsv

  adding: preds_translit.tsv (deflated 70%)
