# Практическое задание 3 

# Классификация предложений с использованием BERT

## курс "Математические методы анализа текстов"


### ФИО: АМИНОВ ТИМУР ВЕНЕРОВИЧ

## Введение

### Постановка задачи

В этом задании вы будете классифицировать предложения из медицинских статей на несколько классов (background, objective и т.д.). 
Для того, чтобы улучшить качество решения вам предлагается дообучить предобученную нейросетевую архитектуру BERT.

### Библиотеки

Для этого задания вам понадобятся следующие библиотеки:
 - [Pytorch](https://pytorch.org/).
 - [Transformers](https://github.com/huggingface/transformers).
 
### Данные

Скачать данные можно здесь: [ссылка на google диск](https://drive.google.com/file/d/13HlWH8jnmsxqDKrEptxOXQg9kkuQMmGq/view?usp=sharing)

## Часть 1. Подготовка данных

Мы будем работать с предложениями из медицинских статей, разбитых на несколько классов. 

In [0]:
import re
from collections import Counter

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


Путь к папке с данными:

In [0]:
DATA_PATH = "/content/drive/My Drive/BERT/sentence_classification_data"

Функция считывания данных:

In [0]:
def read_data(file_name):
    """
    Parameters
    ----------
    file_name : str
        Pubmed sentences file path
        
    Returns
    -------
    text_data : list of str
        List of sentences for algorithm
    
    target_data : list of str
        List of sentence categories
    """
    text_data = []
    target_data = []

    with open(file_name, 'r') as f_input:
        for line in f_input:
            if line.startswith('#') or line == '\n':
                continue
            target, text = line.split('\t')[:2]    

            text_data.append(text)
            target_data.append(target)
    
    return text_data, target_data

Считывание данных:

In [0]:
train_data, train_target = read_data(f'{DATA_PATH}/data_train.txt')
test_data, test_target = read_data(f'{DATA_PATH}/data_test.txt')
dev_data, dev_target = read_data(f'{DATA_PATH}/data_dev.txt')

In [0]:
test_data[10]

'ACTRN12612000642886 .\n'

In [0]:
train_target[20]

'RESULTS'

# Часть 2. Построение бейзлайна (1 балл)

В этой части задания вам необходимо построить бейзлайн модель, с которой вы будете сравнивать ваше решение. В качестве бейзлайна вам предлагается использовать модель логистической регрессии на tf-idf представлениях.

In [0]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

Перед тем как подать в модель предложения, необходимо их предобработать:
    
1. привести все предложения к нижнему регистру
2. удалить из предложений все непробельные символы кроме букв, цифр
3. все цифры заменить на нули

Метки ответов необходимо преобразовать из текстового вида в числовой (это можно сделать с помощью LabelEncoder).

Затем необходимо построить tf-idf матрицу по выбранным предложениям (используйте для подсчёта tf-idf только train_data!) и обучить на них модель логистической регрессии. Используйте dev выборку для подбора гиперпараметров модели. Добейтесь того, что на test и dev выборках accuracy будет будет выше 0.8.

In [0]:
for i in range(len(test_data)):
    test_data[i] = "".join([z for d in ' '.join(a for a in test_data[i].split()) for x in d for z in x if z.isalnum() or z ==' ']).replace("  ", " ")
    test_data[i] = re.sub(r'([0-9])', r'0', test_data[i]).lower()

In [0]:
for i in range(len(train_data)):
    train_data[i] = "".join([z for d in ' '.join(a for a in train_data[i].split()) for x in d for z in x if z.isalnum() or z ==' ']).replace("  ", " ")
    train_data[i] = re.sub(r'([0-9])', r'0', train_data[i]).lower()

In [0]:
for i in range(len(dev_data)):
    dev_data[i] = "".join([z for d in ' '.join(a for a in dev_data[i].split()) for x in d for z in x if z.isalnum() or z ==' ']).replace("  ", " ")
    dev_data[i] = re.sub(r'([0-9])', r'0', dev_data[i]).lower()

In [0]:
test_data[10]

'actrn00000000000000 '

In [0]:
le = LabelEncoder()
le.fit(train_target)
train_target = le.transform(train_target)
test_target = le.transform(test_target)
dev_target = le.transform(dev_target)

In [0]:
train_target[20]

4

In [0]:
vectorizer = TfidfVectorizer(ngram_range=(1,2))
X_train = vectorizer.fit_transform(train_data)

In [0]:
X_test = vectorizer.transform(test_data)

In [0]:
X_dev = vectorizer.transform(dev_data)

In [0]:
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.metrics import accuracy_score, f1_score
from sklearn.metrics.scorer import make_scorer
import numpy as np

In [0]:
lr_grid = {
    'C': np.logspace(-2, 2, 10),
}


In [0]:
clf = LogisticRegression(penalty='l1', C=10 , verbose = 10)

clf.fit(X_train , train_target)



[LibLinear]



LogisticRegression(C=10, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l1',
                   random_state=None, solver='warn', tol=0.0001, verbose=10,
                   warm_start=False)

In [0]:
a_scorer = make_scorer(accuracy_score)

In [0]:
a_scorer(clf , X_test, test_target)

0.79444242559227263

In [0]:
a_scorer(clf , X_dev, dev_target)

0.79472556339001799

In [0]:
gs_l2 = GridSearchCV(LogisticRegression(penalty='l2'), lr_grid, scoring=a_scorer, cv=5, n_jobs=4, verbose = 10)
gs_l2.fit(X_train , train_target)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    7.8s
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:   12.5s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:   22.0s
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:   31.3s
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:   52.5s
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:  1.3min
[Parallel(n_jobs=4)]: Done  50 out of  50 | elapsed:  1.8min finished


GridSearchCV(cv=5, error_score='raise-deprecating',
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=100, multi_class='warn',
                                          n_jobs=None, penalty='l2',
                                          random_state=None, solver='warn',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='warn', n_jobs=4,
             param_grid={'C': array([  1.00000000e-02,   2.78255940e-02,   7.74263683e-02,
         2.15443469e-01,   5.99484250e-01,   1.66810054e+00,
         4.64158883e+00,   1.29154967e+01,   3.59381366e+01,
         1.00000000e+02])},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=make_sco

# успех


In [0]:
a_scorer(gs_l2 , X_test, test_target)

0.80727255676556442

In [0]:
a_scorer(gs_l2 , X_dev, dev_target)

0.80931148900871008

In [0]:
gs_l1 = GridSearchCV(LogisticRegression(penalty='l1'), lr_grid, scoring=a_scorer, cv=5, n_jobs=4 , verbose = 10)
gs_l1.fit(X_train , train_target)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:    6.2s
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:   12.0s
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:   27.6s
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:   39.6s
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed:  1.0min
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:  1.8min
[Parallel(n_jobs=4)]: Done  50 out of  50 | elapsed:  2.3min finished


GridSearchCV(cv=5, error_score='raise-deprecating',
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=100, multi_class='warn',
                                          n_jobs=None, penalty='l1',
                                          random_state=None, solver='warn',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='warn', n_jobs=4,
             param_grid={'C': array([  1.00000000e-02,   2.78255940e-02,   7.74263683e-02,
         2.15443469e-01,   5.99484250e-01,   1.66810054e+00,
         4.64158883e+00,   1.29154967e+01,   3.59381366e+01,
         1.00000000e+02])},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=make_sco

In [0]:
a_scorer(gs_l1 , X_test, test_target)

0.79972335212819579

In [0]:
a_scorer(gs_l1 , X_dev, dev_target)

0.80188027098022951

## Часть 3. Задание BERT (4 балла за 3 и 4 части)

Так как обучающих предложений очень мало, попробуем использовать модель BERT, предобученную на большом датасете. Будем использовать библиотеку transformers. Для обучения модели используйте данные до обработки из предыдущего пункта.

In [0]:
train_data, y_train = read_data(f'{DATA_PATH}/data_train.txt')
test_data, y_test = read_data(f'{DATA_PATH}/data_test.txt')
dev_data, y_dev = read_data(f'{DATA_PATH}/data_dev.txt')

In [0]:
le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_dev= le.transform(y_dev)
y_test = le.fit_transform(y_test)

In [0]:
set(y_train)

{0, 1, 2, 3, 4}

In [0]:
BERT_MODEL_NAME = "bert-base-uncased"
NUM_LABELS = len(set(y_train))

In [0]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/fd/f9/51824e40f0a23a49eab4fcaa45c1c797cbf9761adedd0b558dab7c958b34/transformers-2.1.1-py3-none-any.whl (311kB)
[K     |████████████████████████████████| 317kB 6.6MB/s 
[?25hCollecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/14/3d/efb655a670b98f62ec32d66954e1109f403db4d937c50d779a75b9763a29/sentencepiece-0.1.83-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K     |████████████████████████████████| 1.0MB 47.8MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/1f/8e/ed5364a06a9ba720fddd9820155cc57300d28f5f43a6fd7b7e817177e642/sacremoses-0.0.35.tar.gz (859kB)
[K     |████████████████████████████████| 860kB 38.5MB/s 
Collecting regex
[?25l  Downloading https://files.pythonhosted.org/packages/e3/8e/cbf2295643d7265e7883326fb4654e643bfc93b3a8a8274d8010a39d8804/regex-2019.11.1-cp36-cp36m-manylinux1_x86_64.whl (643kB)
[K     |█████████████████

In [0]:
from transformers import BertTokenizer, BertForSequenceClassification
from transformers import AdamW, WarmupLinearSchedule

import torch
from torch.utils.data import DataLoader, Dataset

Модель BERT работает с специальным форматом данных — все токены из предложения получены с помощью алгоритма BPE. Класс BertTokenizer позволяет получить BPE разбиение для предложения.

In [0]:
tokenizer = BertTokenizer.from_pretrained(BERT_MODEL_NAME)

100%|██████████| 231508/231508 [00:00<00:00, 931000.53B/s]


In [0]:
d  = tokenizer.tokenize(train_data[0])

In [0]:
type(d)

list

['many',
 'pathogen',
 '##ic',
 'processes',
 'and',
 'diseases',
 'are',
 'the',
 'result',
 'of',
 'an',
 'er',
 '##rone',
 '##ous',
 'activation',
 'of',
 'the',
 'complement',
 'cascade',
 'and',
 'a',
 'number',
 'of',
 'inhibitors',
 'of',
 'complement',
 'have',
 'thus',
 'been',
 'examined',
 'for',
 'anti',
 '-',
 'inflammatory',
 'actions',
 '.']

In [0]:
 ids_review  = tokenizer.convert_tokens_to_ids(d)

In [0]:
 ids_review

[2116,
 26835,
 2594,
 6194,
 1998,
 7870,
 2024,
 1996,
 2765,
 1997,
 2019,
 9413,
 20793,
 3560,
 13791,
 1997,
 1996,
 13711,
 16690,
 1998,
 1037,
 2193,
 1997,
 25456,
 1997,
 13711,
 2031,
 2947,
 2042,
 8920,
 2005,
 3424,
 1011,
 20187,
 4506,
 1012]

В библиотеке transformers есть специальный класс для работы с задачей классификации — BertForSequenceClassification. Воспользуемся им, чтобы задать модель.

In [0]:
bert_model = BertForSequenceClassification.from_pretrained(
    BERT_MODEL_NAME, num_labels=NUM_LABELS
)

bert_model.to('cuda')

Реализуем специальный кастомный датасет для токенизированных с помощью BPE предложений. Каждое предложение должно быть преобразовано в последовательность BPE индексов. Не забудьте, что в начале каждого предложения должен стоять специальный токен [CLS], а в конце должен стоять специальный токен [SEP].

Задайте датасет, используя BertTokenizer:

In [0]:
special_tokens_dict = {'cls_token': '<CLS>' , 
                       'sep_token': '<SEP>' }

tokenizer.add_special_tokens(special_tokens_dict)

0

In [0]:
from tqdm import tqdm

In [0]:
class BertTokenizedDataset(Dataset):
    def __init__(self, tokenizer, text_data, target_data=None, max_length=256):
        """
        Parameters
        ----------
        tokenizer : instance of BertTokenizer
        text_data : list of str
            List of input sentences
        target_data : list of int
            List of input targets
        max_length : int
            Maximum length of input sequence (length in bpe tokens)
        """
        
        #tokenized_review = tokenizer.tokenize(text_data)
        tokenized_review = []
        
        #if len(tokenized_review) > max_length:
         #   tokenized_review = tokenized_review[:max_seq_length]
            
        #ids_review  = tokenizer.convert_tokens_to_ids(tokenized_review)
        #ids_review += padding
        #assert len(ids_review) == max_seq_length
        #print(ids_review)
        #ids_review = torch.tensor(ids_review)
        super(BertTokenizedDataset, self).__init__()
        for sent in tqdm(text_data):
          b = tokenizer.tokenize(sent)
          if len(b) > max_length:
            b = b[:max_length]
          #b.append("[SEP]")
          b = ['[CLS]'] + b + ['[SEP]']
          ids_review  = tokenizer.convert_tokens_to_ids(b)
          tokenized_review.append(torch.LongTensor(ids_review))

        
        #sentiment = target_data # color        
        #list_of_labels = [torch.from_numpy(np.array(sentiment))] 
        self.target_data = target_data
        self.data = tokenized_review

    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, i):
        if self.target_data is not None:
            return self.data[i], self.target_data[i]
        else:
            return self.data[i]

Получите все датасеты для всех типов данных. 

**Замечание**. После получения есть смысл сохранить все датасеты на диск, т.к. предобработка занимает время.

In [0]:
train_dataset = BertTokenizedDataset(tokenizer, train_data, y_train)
dev_dataset = BertTokenizedDataset(tokenizer, dev_data, y_dev)
test_dataset = BertTokenizedDataset(tokenizer, test_data, y_test)

100%|██████████| 29493/29493 [00:18<00:00, 1560.32it/s]
100%|██████████| 28932/28932 [00:18<00:00, 1553.74it/s]
100%|██████████| 170614/170614 [01:49<00:00, 1560.34it/s]


Используем  класс PadSequences, чтобы задать способ паддинга, работающий с встроенным в pytorch DataLoader.

In [0]:
class PadSequences:
    def __init__(self, use_labels=False):
        self.use_labels = use_labels
    
    def __call__(self, batch):
        """
        Parameters
        ----------
        batch : list of objects or list of (object, label)
            Each object is list of int indexes.
            Each label is int.
        """
        data_label_batch = batch if self.use_labels else [(x, 0) for x in batch]
            
        # Sort the batch in the descending order
        sorted_batch = sorted(data_label_batch, key=lambda x: x[0].shape[0], reverse=True)
        # Get each sequence and pad it
        sequences = [x[0] for x in sorted_batch]
        sequences_padded = torch.nn.utils.rnn.pad_sequence(sequences, batch_first=True)
        max_lenght = len(sequences[0])

        # Also need to store the length of each sequence
        # This is later needed in order to unpad the sequences
        lengths = torch.LongTensor([[1] * len(x) + [0] * (max_lenght - len(x)) for x in sequences])
        # Don't forget to grab the labels of the *sorted* batch
        
        if self.use_labels:
            labels = torch.LongTensor([x[1] for x in sorted_batch])
            return sequences_padded, lengths, labels
        else:
            return sequences_padded

Зададим DataLoader для каждого из датасетов:

In [0]:
BATCH_SIZE = 16

In [0]:
train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True,
                              collate_fn=PadSequences(use_labels=True))

dev_dataloader = DataLoader(dev_dataset, batch_size=BATCH_SIZE, shuffle=False,
                              collate_fn=PadSequences(use_labels=True))

test_dataloader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False,
                              collate_fn=PadSequences(use_labels=True))

Заметьте, что модель трансформера обучается по достаточному большому размеру батча (обычно 64), который скорее всего не будет влезать на вашу видеокарту. Поэтому, рекомендуется "накапливать" градиенты за несколько итераций. С помощью параметра ACCUMULATION_STEPS задайте, раз в сколько итераций вам необходимо делать шаг метода оптимизации.

In [0]:
EPOCH_AMOUNT = 2
TRAIN_LENGTH = len(train_dataset)
BATCH_SIZE = 16
ACCUMULATION_STEPS = 4

LR = 2e-5

Посчитайте общее число раз, когда ваш оптимизатор будет делать обновления на основе выбранных значений EPOCH_AMOUNT, BATCH_SIZE, ACCUMULATION_STEPS и  TRAIN_LENGTH. Эта величина нужна для правильного задания параметров оптимизаторов.

In [0]:
train_optimization_step_amount =(TRAIN_LENGTH // (BATCH_SIZE * ACCUMULATION_STEPS)) * EPOCH_AMOUNT

Зададим параметры оптимизаторов. Мы будем использовать специальные оптимизаторы из библиотеки transformers AdamW и WarmupLinearSchedule, обеспечивающие плавный разгон и медленное затухание темпа обучения.

In [0]:
optimizer = AdamW(bert_model.parameters(), lr=LR, correct_bias=False)
scheduler = WarmupLinearSchedule(
    optimizer,
    warmup_steps=train_optimization_step_amount * 0.05,
    t_total=train_optimization_step_amount,
)

Для некоторых групп параметров зададим коэффициенты регуляризации.

In [0]:
param_optimizer = list(bert_model.named_parameters())
no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
    {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
    {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
]

## Часть 4. Обучение BERT 

Теперь всё готово к тому, чтобы дообучить BERT на датасете train_dataset!

Используйте dev_dataset для выбора гиперпараметров модели и обучения. Задание будет засчтано на полный балл если на dev_dataset и test_dataset точность будет выше 0.84.

In [0]:
torch.cuda.empty_cache()
!nvidia-smi

Mon Nov 18 16:13:51 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   73C    P0    75W / 149W |    794MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
+-------

In [0]:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


In [0]:
import numpy as np

In [0]:
bert_model.train()
history_loss = []
step = 0
num = 0
for epoch in range(EPOCH_AMOUNT):

  for batch in tqdm(train_dataloader , desc = "TRANING "):
    in_ind , lenght , lable_ind = batch
    out = bert_model(in_ind.to(device) , labels=lable_ind.to(device))
    loss, logits = out
    if((num+1) % (ACCUMULATION_STEPS *10) == 0):
      print("epoch {} step {} loss ={} ".format(epoch+1, num , round(loss.item() , 5)))
    loss.backward() 
    history_loss.append(loss.item()) 
    num += 1
    if (num + 1) % ACCUMULATION_STEPS == 0:
            optimizer.step()
            scheduler.step()
            optimizer.zero_grad()
            step += 1
    






TRANING :   0%|          | 0/1844 [00:00<?, ?it/s][A[A

TRANING :   0%|          | 1/1844 [00:00<13:15,  2.32it/s][A[A

TRANING :   0%|          | 2/1844 [00:00<14:11,  2.16it/s][A[A

TRANING :   0%|          | 3/1844 [00:01<13:20,  2.30it/s][A[A

TRANING :   0%|          | 4/1844 [00:01<14:38,  2.09it/s][A[A

TRANING :   0%|          | 5/1844 [00:02<13:34,  2.26it/s][A[A

TRANING :   0%|          | 6/1844 [00:02<12:23,  2.47it/s][A[A

TRANING :   0%|          | 7/1844 [00:03<12:46,  2.40it/s][A[A

TRANING :   0%|          | 8/1844 [00:03<13:47,  2.22it/s][A[A

TRANING :   0%|          | 9/1844 [00:04<14:39,  2.09it/s][A[A

TRANING :   1%|          | 10/1844 [00:04<13:50,  2.21it/s][A[A

TRANING :   1%|          | 11/1844 [00:04<13:29,  2.27it/s][A[A

TRANING :   1%|          | 12/1844 [00:05<13:35,  2.25it/s][A[A

TRANING :   1%|          | 13/1844 [00:06<15:28,  1.97it/s][A[A

TRANING :   1%|          | 14/1844 [00:06<14:44,  2.07it/s][A[A

TRANING :  

epoch 1 step 39 loss =0.34118 




TRANING :   2%|▏         | 40/1844 [00:19<16:02,  1.87it/s][A[A

TRANING :   2%|▏         | 41/1844 [00:20<15:03,  1.99it/s][A[A

TRANING :   2%|▏         | 42/1844 [00:21<16:22,  1.83it/s][A[A

TRANING :   2%|▏         | 43/1844 [00:21<15:37,  1.92it/s][A[A

TRANING :   2%|▏         | 44/1844 [00:22<15:19,  1.96it/s][A[A

TRANING :   2%|▏         | 45/1844 [00:22<14:16,  2.10it/s][A[A

TRANING :   2%|▏         | 46/1844 [00:23<15:32,  1.93it/s][A[A

TRANING :   3%|▎         | 47/1844 [00:23<16:46,  1.78it/s][A[A

TRANING :   3%|▎         | 48/1844 [00:24<15:44,  1.90it/s][A[A

TRANING :   3%|▎         | 49/1844 [00:24<14:20,  2.09it/s][A[A

TRANING :   3%|▎         | 50/1844 [00:24<14:28,  2.07it/s][A[A

TRANING :   3%|▎         | 51/1844 [00:25<14:20,  2.08it/s][A[A

TRANING :   3%|▎         | 52/1844 [00:26<15:08,  1.97it/s][A[A

TRANING :   3%|▎         | 53/1844 [00:26<14:54,  2.00it/s][A[A

TRANING :   3%|▎         | 54/1844 [00:27<14:52,  2.01it/s]

epoch 1 step 79 loss =0.14758 




TRANING :   4%|▍         | 80/1844 [00:39<15:16,  1.92it/s][A[A

TRANING :   4%|▍         | 81/1844 [00:40<13:41,  2.15it/s][A[A

TRANING :   4%|▍         | 82/1844 [00:40<15:04,  1.95it/s][A[A

TRANING :   5%|▍         | 83/1844 [00:41<15:40,  1.87it/s][A[A

TRANING :   5%|▍         | 84/1844 [00:41<14:36,  2.01it/s][A[A

TRANING :   5%|▍         | 85/1844 [00:42<14:41,  2.00it/s][A[A

TRANING :   5%|▍         | 86/1844 [00:42<15:03,  1.95it/s][A[A

TRANING :   5%|▍         | 87/1844 [00:43<15:03,  1.94it/s][A[A

TRANING :   5%|▍         | 88/1844 [00:44<16:35,  1.76it/s][A[A

TRANING :   5%|▍         | 89/1844 [00:44<15:01,  1.95it/s][A[A

TRANING :   5%|▍         | 90/1844 [00:44<13:50,  2.11it/s][A[A

TRANING :   5%|▍         | 91/1844 [00:45<13:39,  2.14it/s][A[A

TRANING :   5%|▍         | 92/1844 [00:46<16:56,  1.72it/s][A[A

TRANING :   5%|▌         | 93/1844 [00:47<19:04,  1.53it/s][A[A

TRANING :   5%|▌         | 94/1844 [00:47<16:44,  1.74it/s]

epoch 1 step 119 loss =0.24367 




TRANING :   7%|▋         | 120/1844 [01:00<15:38,  1.84it/s][A[A

TRANING :   7%|▋         | 121/1844 [01:01<15:32,  1.85it/s][A[A

TRANING :   7%|▋         | 122/1844 [01:02<16:10,  1.77it/s][A[A

TRANING :   7%|▋         | 123/1844 [01:02<14:43,  1.95it/s][A[A

TRANING :   7%|▋         | 124/1844 [01:02<14:05,  2.03it/s][A[A

TRANING :   7%|▋         | 125/1844 [01:03<12:26,  2.30it/s][A[A

TRANING :   7%|▋         | 126/1844 [01:03<11:28,  2.50it/s][A[A

TRANING :   7%|▋         | 127/1844 [01:03<11:17,  2.53it/s][A[A

TRANING :   7%|▋         | 128/1844 [01:04<11:01,  2.59it/s][A[A

TRANING :   7%|▋         | 129/1844 [01:04<11:34,  2.47it/s][A[A

TRANING :   7%|▋         | 130/1844 [01:04<10:37,  2.69it/s][A[A

TRANING :   7%|▋         | 131/1844 [01:05<15:49,  1.80it/s][A[A

TRANING :   7%|▋         | 132/1844 [01:06<14:55,  1.91it/s][A[A

TRANING :   7%|▋         | 133/1844 [01:06<15:14,  1.87it/s][A[A

TRANING :   7%|▋         | 134/1844 [01:07<15:

epoch 1 step 159 loss =0.3543 




TRANING :   9%|▊         | 161/1844 [01:21<13:47,  2.03it/s][A[A

TRANING :   9%|▉         | 162/1844 [01:22<13:50,  2.03it/s][A[A

TRANING :   9%|▉         | 163/1844 [01:22<13:14,  2.12it/s][A[A

TRANING :   9%|▉         | 164/1844 [01:23<12:31,  2.23it/s][A[A

TRANING :   9%|▉         | 165/1844 [01:23<13:02,  2.15it/s][A[A

TRANING :   9%|▉         | 166/1844 [01:24<13:59,  2.00it/s][A[A

TRANING :   9%|▉         | 167/1844 [01:24<14:59,  1.86it/s][A[A

TRANING :   9%|▉         | 168/1844 [01:25<14:16,  1.96it/s][A[A

TRANING :   9%|▉         | 169/1844 [01:25<13:08,  2.13it/s][A[A

TRANING :   9%|▉         | 170/1844 [01:26<15:14,  1.83it/s][A[A

TRANING :   9%|▉         | 171/1844 [01:26<13:48,  2.02it/s][A[A

TRANING :   9%|▉         | 172/1844 [01:27<13:22,  2.08it/s][A[A

TRANING :   9%|▉         | 173/1844 [01:27<12:50,  2.17it/s][A[A

TRANING :   9%|▉         | 174/1844 [01:28<15:00,  1.86it/s][A[A

TRANING :   9%|▉         | 175/1844 [01:28<14:

epoch 1 step 199 loss =0.68629 




TRANING :  11%|█         | 200/1844 [01:41<12:10,  2.25it/s][A[A

TRANING :  11%|█         | 201/1844 [01:42<11:32,  2.37it/s][A[A

TRANING :  11%|█         | 202/1844 [01:42<11:27,  2.39it/s][A[A

TRANING :  11%|█         | 203/1844 [01:43<11:25,  2.39it/s][A[A

TRANING :  11%|█         | 204/1844 [01:43<13:42,  2.00it/s][A[A

TRANING :  11%|█         | 205/1844 [01:44<13:44,  1.99it/s][A[A

TRANING :  11%|█         | 206/1844 [01:44<14:36,  1.87it/s][A[A

TRANING :  11%|█         | 207/1844 [01:45<13:13,  2.06it/s][A[A

TRANING :  11%|█▏        | 208/1844 [01:45<13:37,  2.00it/s][A[A

TRANING :  11%|█▏        | 209/1844 [01:46<12:25,  2.19it/s][A[A

TRANING :  11%|█▏        | 210/1844 [01:46<12:53,  2.11it/s][A[A

TRANING :  11%|█▏        | 211/1844 [01:47<14:08,  1.93it/s][A[A

TRANING :  11%|█▏        | 212/1844 [01:47<13:11,  2.06it/s][A[A

TRANING :  12%|█▏        | 213/1844 [01:48<13:21,  2.04it/s][A[A

TRANING :  12%|█▏        | 214/1844 [01:48<13:

epoch 1 step 239 loss =0.20597 




TRANING :  13%|█▎        | 240/1844 [02:01<13:03,  2.05it/s][A[A

TRANING :  13%|█▎        | 241/1844 [02:02<12:29,  2.14it/s][A[A

TRANING :  13%|█▎        | 242/1844 [02:02<12:20,  2.16it/s][A[A

TRANING :  13%|█▎        | 243/1844 [02:03<12:59,  2.05it/s][A[A

TRANING :  13%|█▎        | 244/1844 [02:03<12:42,  2.10it/s][A[A

TRANING :  13%|█▎        | 245/1844 [02:04<13:25,  1.98it/s][A[A

TRANING :  13%|█▎        | 246/1844 [02:04<13:32,  1.97it/s][A[A

TRANING :  13%|█▎        | 247/1844 [02:05<13:10,  2.02it/s][A[A

TRANING :  13%|█▎        | 248/1844 [02:05<13:28,  1.97it/s][A[A

TRANING :  14%|█▎        | 249/1844 [02:06<13:45,  1.93it/s][A[A

TRANING :  14%|█▎        | 250/1844 [02:06<13:17,  2.00it/s][A[A

TRANING :  14%|█▎        | 251/1844 [02:07<13:24,  1.98it/s][A[A

TRANING :  14%|█▎        | 252/1844 [02:07<12:09,  2.18it/s][A[A

TRANING :  14%|█▎        | 253/1844 [02:08<12:54,  2.05it/s][A[A

TRANING :  14%|█▍        | 254/1844 [02:08<15:

epoch 1 step 279 loss =0.25112 




TRANING :  15%|█▌        | 280/1844 [02:22<12:12,  2.14it/s][A[A

TRANING :  15%|█▌        | 281/1844 [02:23<16:18,  1.60it/s][A[A

TRANING :  15%|█▌        | 282/1844 [02:23<14:41,  1.77it/s][A[A

TRANING :  15%|█▌        | 283/1844 [02:24<13:08,  1.98it/s][A[A

TRANING :  15%|█▌        | 284/1844 [02:25<14:40,  1.77it/s][A[A

TRANING :  15%|█▌        | 285/1844 [02:25<15:14,  1.70it/s][A[A

TRANING :  16%|█▌        | 286/1844 [02:26<17:41,  1.47it/s][A[A

TRANING :  16%|█▌        | 287/1844 [02:27<19:16,  1.35it/s][A[A

TRANING :  16%|█▌        | 288/1844 [02:28<20:26,  1.27it/s][A[A

TRANING :  16%|█▌        | 289/1844 [02:28<16:49,  1.54it/s][A[A

TRANING :  16%|█▌        | 290/1844 [02:29<15:14,  1.70it/s][A[A

TRANING :  16%|█▌        | 291/1844 [02:29<15:04,  1.72it/s][A[A

TRANING :  16%|█▌        | 292/1844 [02:30<14:04,  1.84it/s][A[A

TRANING :  16%|█▌        | 293/1844 [02:30<13:59,  1.85it/s][A[A

TRANING :  16%|█▌        | 294/1844 [02:31<13:

epoch 1 step 319 loss =0.18929 




TRANING :  17%|█▋        | 320/1844 [02:44<15:38,  1.62it/s][A[A

TRANING :  17%|█▋        | 321/1844 [02:45<14:09,  1.79it/s][A[A

TRANING :  17%|█▋        | 322/1844 [02:45<14:32,  1.75it/s][A[A

TRANING :  18%|█▊        | 323/1844 [02:46<13:17,  1.91it/s][A[A

TRANING :  18%|█▊        | 324/1844 [02:46<12:35,  2.01it/s][A[A

TRANING :  18%|█▊        | 325/1844 [02:46<11:38,  2.17it/s][A[A

TRANING :  18%|█▊        | 326/1844 [02:47<11:22,  2.22it/s][A[A

TRANING :  18%|█▊        | 327/1844 [02:47<12:21,  2.05it/s][A[A

TRANING :  18%|█▊        | 328/1844 [02:48<10:54,  2.31it/s][A[A

TRANING :  18%|█▊        | 329/1844 [02:48<10:47,  2.34it/s][A[A

TRANING :  18%|█▊        | 330/1844 [02:49<13:19,  1.89it/s][A[A

TRANING :  18%|█▊        | 331/1844 [02:49<12:45,  1.98it/s][A[A

TRANING :  18%|█▊        | 332/1844 [02:50<14:34,  1.73it/s][A[A

TRANING :  18%|█▊        | 333/1844 [02:50<13:03,  1.93it/s][A[A

TRANING :  18%|█▊        | 334/1844 [02:51<14:

epoch 1 step 359 loss =0.43089 




TRANING :  20%|█▉        | 360/1844 [03:06<14:38,  1.69it/s][A[A

TRANING :  20%|█▉        | 361/1844 [03:06<13:40,  1.81it/s][A[A

TRANING :  20%|█▉        | 362/1844 [03:07<12:09,  2.03it/s][A[A

TRANING :  20%|█▉        | 363/1844 [03:07<11:23,  2.17it/s][A[A

TRANING :  20%|█▉        | 364/1844 [03:08<12:22,  1.99it/s][A[A

TRANING :  20%|█▉        | 365/1844 [03:08<12:27,  1.98it/s][A[A

TRANING :  20%|█▉        | 366/1844 [03:09<12:27,  1.98it/s][A[A

TRANING :  20%|█▉        | 367/1844 [03:09<11:25,  2.15it/s][A[A

TRANING :  20%|█▉        | 368/1844 [03:09<12:27,  1.97it/s][A[A

TRANING :  20%|██        | 369/1844 [03:10<11:57,  2.06it/s][A[A

TRANING :  20%|██        | 370/1844 [03:11<13:19,  1.84it/s][A[A

TRANING :  20%|██        | 371/1844 [03:11<13:01,  1.88it/s][A[A

TRANING :  20%|██        | 372/1844 [03:11<11:50,  2.07it/s][A[A

TRANING :  20%|██        | 373/1844 [03:12<11:18,  2.17it/s][A[A

TRANING :  20%|██        | 374/1844 [03:12<12:

epoch 1 step 399 loss =0.95234 




TRANING :  22%|██▏       | 400/1844 [03:25<10:15,  2.34it/s][A[A

TRANING :  22%|██▏       | 401/1844 [03:26<09:51,  2.44it/s][A[A

TRANING :  22%|██▏       | 402/1844 [03:26<10:48,  2.22it/s][A[A

TRANING :  22%|██▏       | 403/1844 [03:27<12:17,  1.95it/s][A[A

TRANING :  22%|██▏       | 404/1844 [03:28<12:21,  1.94it/s][A[A

TRANING :  22%|██▏       | 405/1844 [03:28<11:22,  2.11it/s][A[A

TRANING :  22%|██▏       | 406/1844 [03:28<11:28,  2.09it/s][A[A

TRANING :  22%|██▏       | 407/1844 [03:29<11:51,  2.02it/s][A[A

TRANING :  22%|██▏       | 408/1844 [03:29<11:06,  2.16it/s][A[A

TRANING :  22%|██▏       | 409/1844 [03:30<10:14,  2.34it/s][A[A

TRANING :  22%|██▏       | 410/1844 [03:31<16:46,  1.42it/s][A[A

TRANING :  22%|██▏       | 411/1844 [03:32<17:52,  1.34it/s][A[A

TRANING :  22%|██▏       | 412/1844 [03:32<15:20,  1.56it/s][A[A

TRANING :  22%|██▏       | 413/1844 [03:33<15:21,  1.55it/s][A[A

TRANING :  22%|██▏       | 414/1844 [03:33<13:

epoch 1 step 439 loss =0.2462 




TRANING :  24%|██▍       | 440/1844 [03:46<13:02,  1.79it/s][A[A

TRANING :  24%|██▍       | 441/1844 [03:46<11:14,  2.08it/s][A[A

TRANING :  24%|██▍       | 442/1844 [03:46<11:05,  2.11it/s][A[A

TRANING :  24%|██▍       | 443/1844 [03:47<12:02,  1.94it/s][A[A

TRANING :  24%|██▍       | 444/1844 [03:48<14:20,  1.63it/s][A[A

TRANING :  24%|██▍       | 445/1844 [03:48<13:16,  1.76it/s][A[A

TRANING :  24%|██▍       | 446/1844 [03:49<13:06,  1.78it/s][A[A

TRANING :  24%|██▍       | 447/1844 [03:49<12:22,  1.88it/s][A[A

TRANING :  24%|██▍       | 448/1844 [03:50<11:54,  1.95it/s][A[A

TRANING :  24%|██▍       | 449/1844 [03:50<12:15,  1.90it/s][A[A

TRANING :  24%|██▍       | 450/1844 [03:51<12:06,  1.92it/s][A[A

TRANING :  24%|██▍       | 451/1844 [03:51<11:10,  2.08it/s][A[A

TRANING :  25%|██▍       | 452/1844 [03:52<10:37,  2.18it/s][A[A

TRANING :  25%|██▍       | 453/1844 [03:52<10:57,  2.12it/s][A[A

TRANING :  25%|██▍       | 454/1844 [03:53<10:

epoch 1 step 479 loss =0.3527 




TRANING :  26%|██▌       | 480/1844 [04:06<12:09,  1.87it/s][A[A

TRANING :  26%|██▌       | 481/1844 [04:07<11:22,  2.00it/s][A[A

TRANING :  26%|██▌       | 482/1844 [04:07<10:10,  2.23it/s][A[A

TRANING :  26%|██▌       | 483/1844 [04:08<11:04,  2.05it/s][A[A

TRANING :  26%|██▌       | 484/1844 [04:08<11:31,  1.97it/s][A[A

TRANING :  26%|██▋       | 485/1844 [04:08<10:14,  2.21it/s][A[A

TRANING :  26%|██▋       | 486/1844 [04:09<11:45,  1.93it/s][A[A

TRANING :  26%|██▋       | 487/1844 [04:10<11:25,  1.98it/s][A[A

TRANING :  26%|██▋       | 488/1844 [04:10<11:56,  1.89it/s][A[A

TRANING :  27%|██▋       | 489/1844 [04:11<12:32,  1.80it/s][A[A

TRANING :  27%|██▋       | 490/1844 [04:11<11:51,  1.90it/s][A[A

TRANING :  27%|██▋       | 491/1844 [04:12<12:07,  1.86it/s][A[A

TRANING :  27%|██▋       | 492/1844 [04:12<11:25,  1.97it/s][A[A

TRANING :  27%|██▋       | 493/1844 [04:13<10:21,  2.17it/s][A[A

TRANING :  27%|██▋       | 494/1844 [04:13<10:

epoch 1 step 519 loss =0.83487 




TRANING :  28%|██▊       | 520/1844 [04:26<10:08,  2.18it/s][A[A

TRANING :  28%|██▊       | 521/1844 [04:26<10:12,  2.16it/s][A[A

TRANING :  28%|██▊       | 522/1844 [04:26<09:20,  2.36it/s][A[A

TRANING :  28%|██▊       | 523/1844 [04:27<12:48,  1.72it/s][A[A

TRANING :  28%|██▊       | 524/1844 [04:28<13:55,  1.58it/s][A[A

TRANING :  28%|██▊       | 525/1844 [04:28<12:31,  1.75it/s][A[A

TRANING :  29%|██▊       | 526/1844 [04:29<11:34,  1.90it/s][A[A

TRANING :  29%|██▊       | 527/1844 [04:29<11:35,  1.89it/s][A[A

TRANING :  29%|██▊       | 528/1844 [04:30<10:44,  2.04it/s][A[A

TRANING :  29%|██▊       | 529/1844 [04:31<11:53,  1.84it/s][A[A

TRANING :  29%|██▊       | 530/1844 [04:31<11:21,  1.93it/s][A[A

TRANING :  29%|██▉       | 531/1844 [04:32<11:49,  1.85it/s][A[A

TRANING :  29%|██▉       | 532/1844 [04:32<12:00,  1.82it/s][A[A

TRANING :  29%|██▉       | 533/1844 [04:33<11:26,  1.91it/s][A[A

TRANING :  29%|██▉       | 534/1844 [04:33<10:

epoch 1 step 559 loss =0.17391 




TRANING :  30%|███       | 560/1844 [04:48<13:58,  1.53it/s][A[A

TRANING :  30%|███       | 561/1844 [04:48<12:34,  1.70it/s][A[A

TRANING :  30%|███       | 562/1844 [04:49<11:39,  1.83it/s][A[A

TRANING :  31%|███       | 563/1844 [04:49<12:28,  1.71it/s][A[A

TRANING :  31%|███       | 564/1844 [04:50<13:23,  1.59it/s][A[A

TRANING :  31%|███       | 565/1844 [04:50<12:22,  1.72it/s][A[A

TRANING :  31%|███       | 566/1844 [04:51<12:46,  1.67it/s][A[A

TRANING :  31%|███       | 567/1844 [04:52<12:23,  1.72it/s][A[A

TRANING :  31%|███       | 568/1844 [04:52<11:03,  1.92it/s][A[A

TRANING :  31%|███       | 569/1844 [04:52<10:37,  2.00it/s][A[A

TRANING :  31%|███       | 570/1844 [04:53<11:49,  1.80it/s][A[A

TRANING :  31%|███       | 571/1844 [04:54<10:47,  1.97it/s][A[A

TRANING :  31%|███       | 572/1844 [04:54<10:03,  2.11it/s][A[A

TRANING :  31%|███       | 573/1844 [04:55<11:33,  1.83it/s][A[A

TRANING :  31%|███       | 574/1844 [04:55<11:

epoch 1 step 599 loss =0.55638 




TRANING :  33%|███▎      | 600/1844 [05:10<14:52,  1.39it/s][A[A

TRANING :  33%|███▎      | 601/1844 [05:11<13:40,  1.51it/s][A[A

TRANING :  33%|███▎      | 602/1844 [05:11<13:29,  1.53it/s][A[A

TRANING :  33%|███▎      | 603/1844 [05:12<12:25,  1.66it/s][A[A

TRANING :  33%|███▎      | 604/1844 [05:12<10:46,  1.92it/s][A[A

TRANING :  33%|███▎      | 605/1844 [05:12<10:12,  2.02it/s][A[A

TRANING :  33%|███▎      | 606/1844 [05:13<10:03,  2.05it/s][A[A

TRANING :  33%|███▎      | 607/1844 [05:14<11:09,  1.85it/s][A[A

TRANING :  33%|███▎      | 608/1844 [05:14<11:05,  1.86it/s][A[A

TRANING :  33%|███▎      | 609/1844 [05:15<12:01,  1.71it/s][A[A

TRANING :  33%|███▎      | 610/1844 [05:15<10:16,  2.00it/s][A[A

TRANING :  33%|███▎      | 611/1844 [05:16<09:45,  2.10it/s][A[A

TRANING :  33%|███▎      | 612/1844 [05:16<10:12,  2.01it/s][A[A

TRANING :  33%|███▎      | 613/1844 [05:17<10:45,  1.91it/s][A[A

TRANING :  33%|███▎      | 614/1844 [05:17<09:

epoch 1 step 639 loss =0.1138 




TRANING :  35%|███▍      | 640/1844 [05:31<08:21,  2.40it/s][A[A

TRANING :  35%|███▍      | 641/1844 [05:31<08:36,  2.33it/s][A[A

TRANING :  35%|███▍      | 642/1844 [05:32<09:42,  2.07it/s][A[A

TRANING :  35%|███▍      | 643/1844 [05:32<08:54,  2.25it/s][A[A

TRANING :  35%|███▍      | 644/1844 [05:32<08:18,  2.41it/s][A[A

TRANING :  35%|███▍      | 645/1844 [05:33<08:21,  2.39it/s][A[A

TRANING :  35%|███▌      | 646/1844 [05:33<07:50,  2.55it/s][A[A

TRANING :  35%|███▌      | 647/1844 [05:34<08:07,  2.45it/s][A[A

TRANING :  35%|███▌      | 648/1844 [05:34<08:21,  2.38it/s][A[A

TRANING :  35%|███▌      | 649/1844 [05:34<07:57,  2.50it/s][A[A

TRANING :  35%|███▌      | 650/1844 [05:35<08:32,  2.33it/s][A[A

TRANING :  35%|███▌      | 651/1844 [05:35<09:11,  2.16it/s][A[A

TRANING :  35%|███▌      | 652/1844 [05:36<10:13,  1.94it/s][A[A

TRANING :  35%|███▌      | 653/1844 [05:36<09:59,  1.99it/s][A[A

TRANING :  35%|███▌      | 654/1844 [05:37<10:

epoch 1 step 679 loss =0.33442 




TRANING :  37%|███▋      | 680/1844 [05:50<08:06,  2.39it/s][A[A

TRANING :  37%|███▋      | 681/1844 [05:50<07:36,  2.55it/s][A[A

TRANING :  37%|███▋      | 682/1844 [05:50<07:15,  2.67it/s][A[A

TRANING :  37%|███▋      | 683/1844 [05:51<07:29,  2.58it/s][A[A

TRANING :  37%|███▋      | 684/1844 [05:51<07:02,  2.74it/s][A[A

TRANING :  37%|███▋      | 685/1844 [05:51<07:45,  2.49it/s][A[A

TRANING :  37%|███▋      | 686/1844 [05:52<09:10,  2.11it/s][A[A

TRANING :  37%|███▋      | 687/1844 [05:53<09:12,  2.10it/s][A[A

TRANING :  37%|███▋      | 688/1844 [05:53<09:25,  2.04it/s][A[A

TRANING :  37%|███▋      | 689/1844 [05:54<11:09,  1.72it/s][A[A

TRANING :  37%|███▋      | 690/1844 [05:54<10:26,  1.84it/s][A[A

TRANING :  37%|███▋      | 691/1844 [05:55<09:55,  1.94it/s][A[A

TRANING :  38%|███▊      | 692/1844 [05:55<09:40,  1.98it/s][A[A

TRANING :  38%|███▊      | 693/1844 [05:56<12:15,  1.57it/s][A[A

TRANING :  38%|███▊      | 694/1844 [05:57<12:

epoch 1 step 719 loss =0.04044 




TRANING :  39%|███▉      | 720/1844 [06:10<10:26,  1.79it/s][A[A

TRANING :  39%|███▉      | 721/1844 [06:10<09:40,  1.93it/s][A[A

TRANING :  39%|███▉      | 722/1844 [06:11<09:56,  1.88it/s][A[A

TRANING :  39%|███▉      | 723/1844 [06:11<09:31,  1.96it/s][A[A

TRANING :  39%|███▉      | 724/1844 [06:12<09:47,  1.91it/s][A[A

TRANING :  39%|███▉      | 725/1844 [06:12<09:13,  2.02it/s][A[A

TRANING :  39%|███▉      | 726/1844 [06:13<09:11,  2.03it/s][A[A

TRANING :  39%|███▉      | 727/1844 [06:13<10:58,  1.70it/s][A[A

TRANING :  39%|███▉      | 728/1844 [06:14<11:11,  1.66it/s][A[A

TRANING :  40%|███▉      | 729/1844 [06:14<10:26,  1.78it/s][A[A

TRANING :  40%|███▉      | 730/1844 [06:15<12:05,  1.54it/s][A[A

TRANING :  40%|███▉      | 731/1844 [06:16<10:28,  1.77it/s][A[A

TRANING :  40%|███▉      | 732/1844 [06:16<10:41,  1.73it/s][A[A

TRANING :  40%|███▉      | 733/1844 [06:17<10:59,  1.68it/s][A[A

TRANING :  40%|███▉      | 734/1844 [06:17<09:

epoch 1 step 759 loss =0.15919 




TRANING :  41%|████      | 760/1844 [06:31<10:09,  1.78it/s][A[A

TRANING :  41%|████▏     | 761/1844 [06:31<09:24,  1.92it/s][A[A

TRANING :  41%|████▏     | 762/1844 [06:32<08:37,  2.09it/s][A[A

TRANING :  41%|████▏     | 763/1844 [06:32<08:25,  2.14it/s][A[A

TRANING :  41%|████▏     | 764/1844 [06:33<07:35,  2.37it/s][A[A

TRANING :  41%|████▏     | 765/1844 [06:33<08:11,  2.19it/s][A[A

TRANING :  42%|████▏     | 766/1844 [06:34<08:57,  2.00it/s][A[A

TRANING :  42%|████▏     | 767/1844 [06:34<09:27,  1.90it/s][A[A

TRANING :  42%|████▏     | 768/1844 [06:35<09:47,  1.83it/s][A[A

TRANING :  42%|████▏     | 769/1844 [06:35<08:36,  2.08it/s][A[A

TRANING :  42%|████▏     | 770/1844 [06:36<09:00,  1.99it/s][A[A

TRANING :  42%|████▏     | 771/1844 [06:36<09:28,  1.89it/s][A[A

TRANING :  42%|████▏     | 772/1844 [06:37<09:48,  1.82it/s][A[A

TRANING :  42%|████▏     | 773/1844 [06:38<10:46,  1.66it/s][A[A

TRANING :  42%|████▏     | 774/1844 [06:38<09:

epoch 1 step 799 loss =0.17566 




TRANING :  43%|████▎     | 800/1844 [06:52<08:08,  2.14it/s][A[A

TRANING :  43%|████▎     | 801/1844 [06:52<09:41,  1.79it/s][A[A

TRANING :  43%|████▎     | 802/1844 [06:53<09:24,  1.85it/s][A[A

TRANING :  44%|████▎     | 803/1844 [06:53<08:35,  2.02it/s][A[A

TRANING :  44%|████▎     | 804/1844 [06:54<07:46,  2.23it/s][A[A

TRANING :  44%|████▎     | 805/1844 [06:55<11:31,  1.50it/s][A[A

TRANING :  44%|████▎     | 806/1844 [06:55<10:45,  1.61it/s][A[A

TRANING :  44%|████▍     | 807/1844 [06:56<10:43,  1.61it/s][A[A

TRANING :  44%|████▍     | 808/1844 [06:57<10:14,  1.69it/s][A[A

TRANING :  44%|████▍     | 809/1844 [06:57<11:20,  1.52it/s][A[A

TRANING :  44%|████▍     | 810/1844 [06:58<11:07,  1.55it/s][A[A

TRANING :  44%|████▍     | 811/1844 [06:59<10:25,  1.65it/s][A[A

TRANING :  44%|████▍     | 812/1844 [06:59<09:09,  1.88it/s][A[A

TRANING :  44%|████▍     | 813/1844 [06:59<08:54,  1.93it/s][A[A

TRANING :  44%|████▍     | 814/1844 [07:00<07:

epoch 1 step 839 loss =0.10813 




TRANING :  46%|████▌     | 840/1844 [07:13<07:41,  2.18it/s][A[A

TRANING :  46%|████▌     | 841/1844 [07:13<07:24,  2.26it/s][A[A

TRANING :  46%|████▌     | 842/1844 [07:14<08:01,  2.08it/s][A[A

TRANING :  46%|████▌     | 843/1844 [07:14<07:26,  2.24it/s][A[A

TRANING :  46%|████▌     | 844/1844 [07:14<07:52,  2.12it/s][A[A

TRANING :  46%|████▌     | 845/1844 [07:15<07:39,  2.17it/s][A[A

TRANING :  46%|████▌     | 846/1844 [07:15<08:03,  2.06it/s][A[A

TRANING :  46%|████▌     | 847/1844 [07:16<09:27,  1.76it/s][A[A

TRANING :  46%|████▌     | 848/1844 [07:17<09:24,  1.76it/s][A[A

TRANING :  46%|████▌     | 849/1844 [07:17<08:19,  1.99it/s][A[A

TRANING :  46%|████▌     | 850/1844 [07:18<07:55,  2.09it/s][A[A

TRANING :  46%|████▌     | 851/1844 [07:18<07:27,  2.22it/s][A[A

TRANING :  46%|████▌     | 852/1844 [07:18<08:01,  2.06it/s][A[A

TRANING :  46%|████▋     | 853/1844 [07:19<08:15,  2.00it/s][A[A

TRANING :  46%|████▋     | 854/1844 [07:19<07:

epoch 1 step 879 loss =0.29834 




TRANING :  48%|████▊     | 880/1844 [07:33<07:17,  2.20it/s][A[A

TRANING :  48%|████▊     | 881/1844 [07:33<06:46,  2.37it/s][A[A

TRANING :  48%|████▊     | 882/1844 [07:34<06:54,  2.32it/s][A[A

TRANING :  48%|████▊     | 883/1844 [07:34<07:39,  2.09it/s][A[A

TRANING :  48%|████▊     | 884/1844 [07:35<07:38,  2.10it/s][A[A

TRANING :  48%|████▊     | 885/1844 [07:35<07:10,  2.23it/s][A[A

TRANING :  48%|████▊     | 886/1844 [07:36<07:34,  2.11it/s][A[A

TRANING :  48%|████▊     | 887/1844 [07:37<10:02,  1.59it/s][A[A

TRANING :  48%|████▊     | 888/1844 [07:37<09:50,  1.62it/s][A[A

TRANING :  48%|████▊     | 889/1844 [07:38<08:26,  1.89it/s][A[A

TRANING :  48%|████▊     | 890/1844 [07:38<07:41,  2.07it/s][A[A

TRANING :  48%|████▊     | 891/1844 [07:39<08:14,  1.93it/s][A[A

TRANING :  48%|████▊     | 892/1844 [07:39<09:47,  1.62it/s][A[A

TRANING :  48%|████▊     | 893/1844 [07:40<08:22,  1.89it/s][A[A

TRANING :  48%|████▊     | 894/1844 [07:40<08:

epoch 1 step 919 loss =0.2442 




TRANING :  50%|████▉     | 920/1844 [07:54<07:23,  2.08it/s][A[A

TRANING :  50%|████▉     | 921/1844 [07:55<09:54,  1.55it/s][A[A

TRANING :  50%|█████     | 922/1844 [07:56<09:08,  1.68it/s][A[A

TRANING :  50%|█████     | 923/1844 [07:56<08:36,  1.78it/s][A[A

TRANING :  50%|█████     | 924/1844 [07:57<09:49,  1.56it/s][A[A

TRANING :  50%|█████     | 925/1844 [07:57<09:22,  1.63it/s][A[A

TRANING :  50%|█████     | 926/1844 [07:58<08:34,  1.78it/s][A[A

TRANING :  50%|█████     | 927/1844 [07:59<09:52,  1.55it/s][A[A

TRANING :  50%|█████     | 928/1844 [07:59<09:04,  1.68it/s][A[A

TRANING :  50%|█████     | 929/1844 [08:00<09:28,  1.61it/s][A[A

TRANING :  50%|█████     | 930/1844 [08:00<08:35,  1.77it/s][A[A

TRANING :  50%|█████     | 931/1844 [08:01<08:12,  1.85it/s][A[A

TRANING :  51%|█████     | 932/1844 [08:01<07:55,  1.92it/s][A[A

TRANING :  51%|█████     | 933/1844 [08:02<08:11,  1.85it/s][A[A

TRANING :  51%|█████     | 934/1844 [08:02<07:

epoch 1 step 959 loss =0.54532 




TRANING :  52%|█████▏    | 960/1844 [08:15<06:58,  2.11it/s][A[A

TRANING :  52%|█████▏    | 961/1844 [08:16<06:14,  2.36it/s][A[A

TRANING :  52%|█████▏    | 962/1844 [08:16<07:11,  2.04it/s][A[A

TRANING :  52%|█████▏    | 963/1844 [08:17<06:24,  2.29it/s][A[A

TRANING :  52%|█████▏    | 964/1844 [08:17<06:31,  2.25it/s][A[A

TRANING :  52%|█████▏    | 965/1844 [08:18<07:03,  2.08it/s][A[A

TRANING :  52%|█████▏    | 966/1844 [08:18<06:26,  2.27it/s][A[A

TRANING :  52%|█████▏    | 967/1844 [08:18<06:31,  2.24it/s][A[A

TRANING :  52%|█████▏    | 968/1844 [08:19<06:53,  2.12it/s][A[A

TRANING :  53%|█████▎    | 969/1844 [08:19<06:30,  2.24it/s][A[A

TRANING :  53%|█████▎    | 970/1844 [08:20<06:55,  2.10it/s][A[A

TRANING :  53%|█████▎    | 971/1844 [08:21<07:54,  1.84it/s][A[A

TRANING :  53%|█████▎    | 972/1844 [08:21<08:05,  1.80it/s][A[A

TRANING :  53%|█████▎    | 973/1844 [08:22<07:00,  2.07it/s][A[A

TRANING :  53%|█████▎    | 974/1844 [08:22<07:

epoch 1 step 999 loss =0.50935 




TRANING :  54%|█████▍    | 1000/1844 [08:36<07:14,  1.94it/s][A[A

TRANING :  54%|█████▍    | 1001/1844 [08:36<06:48,  2.06it/s][A[A

TRANING :  54%|█████▍    | 1002/1844 [08:36<06:07,  2.29it/s][A[A

TRANING :  54%|█████▍    | 1003/1844 [08:37<06:49,  2.05it/s][A[A

TRANING :  54%|█████▍    | 1004/1844 [08:37<06:38,  2.11it/s][A[A

TRANING :  55%|█████▍    | 1005/1844 [08:38<06:11,  2.26it/s][A[A

TRANING :  55%|█████▍    | 1006/1844 [08:38<06:15,  2.23it/s][A[A

TRANING :  55%|█████▍    | 1007/1844 [08:39<06:09,  2.27it/s][A[A

TRANING :  55%|█████▍    | 1008/1844 [08:39<07:19,  1.90it/s][A[A

TRANING :  55%|█████▍    | 1009/1844 [08:40<06:52,  2.03it/s][A[A

TRANING :  55%|█████▍    | 1010/1844 [08:40<06:43,  2.07it/s][A[A

TRANING :  55%|█████▍    | 1011/1844 [08:41<07:03,  1.97it/s][A[A

TRANING :  55%|█████▍    | 1012/1844 [08:41<06:16,  2.21it/s][A[A

TRANING :  55%|█████▍    | 1013/1844 [08:41<05:56,  2.33it/s][A[A

TRANING :  55%|█████▍    | 1014/

epoch 1 step 1039 loss =0.09404 




TRANING :  56%|█████▋    | 1040/1844 [08:55<06:11,  2.17it/s][A[A

TRANING :  56%|█████▋    | 1041/1844 [08:55<06:00,  2.23it/s][A[A

TRANING :  57%|█████▋    | 1042/1844 [08:55<05:27,  2.45it/s][A[A

TRANING :  57%|█████▋    | 1043/1844 [08:56<05:21,  2.49it/s][A[A

TRANING :  57%|█████▋    | 1044/1844 [08:56<07:06,  1.88it/s][A[A

TRANING :  57%|█████▋    | 1045/1844 [08:57<06:59,  1.90it/s][A[A

TRANING :  57%|█████▋    | 1046/1844 [08:57<06:42,  1.98it/s][A[A

TRANING :  57%|█████▋    | 1047/1844 [08:58<06:30,  2.04it/s][A[A

TRANING :  57%|█████▋    | 1048/1844 [08:58<06:26,  2.06it/s][A[A

TRANING :  57%|█████▋    | 1049/1844 [08:59<06:28,  2.04it/s][A[A

TRANING :  57%|█████▋    | 1050/1844 [09:00<07:46,  1.70it/s][A[A

TRANING :  57%|█████▋    | 1051/1844 [09:00<07:59,  1.65it/s][A[A

TRANING :  57%|█████▋    | 1052/1844 [09:01<07:19,  1.80it/s][A[A

TRANING :  57%|█████▋    | 1053/1844 [09:01<07:03,  1.87it/s][A[A

TRANING :  57%|█████▋    | 1054/

epoch 1 step 1079 loss =0.12798 




TRANING :  59%|█████▊    | 1080/1844 [09:15<07:22,  1.73it/s][A[A

TRANING :  59%|█████▊    | 1081/1844 [09:16<07:53,  1.61it/s][A[A

TRANING :  59%|█████▊    | 1082/1844 [09:16<06:45,  1.88it/s][A[A

TRANING :  59%|█████▊    | 1083/1844 [09:17<06:51,  1.85it/s][A[A

TRANING :  59%|█████▉    | 1084/1844 [09:17<06:36,  1.92it/s][A[A

TRANING :  59%|█████▉    | 1085/1844 [09:18<06:46,  1.87it/s][A[A

TRANING :  59%|█████▉    | 1086/1844 [09:18<06:10,  2.05it/s][A[A

TRANING :  59%|█████▉    | 1087/1844 [09:19<06:13,  2.02it/s][A[A

TRANING :  59%|█████▉    | 1088/1844 [09:19<05:37,  2.24it/s][A[A

TRANING :  59%|█████▉    | 1089/1844 [09:19<05:13,  2.41it/s][A[A

TRANING :  59%|█████▉    | 1090/1844 [09:20<05:15,  2.39it/s][A[A

TRANING :  59%|█████▉    | 1091/1844 [09:21<07:37,  1.65it/s][A[A

TRANING :  59%|█████▉    | 1092/1844 [09:21<07:01,  1.78it/s][A[A

TRANING :  59%|█████▉    | 1093/1844 [09:22<06:26,  1.94it/s][A[A

TRANING :  59%|█████▉    | 1094/

epoch 1 step 1119 loss =0.1323 




TRANING :  61%|██████    | 1120/1844 [09:36<06:23,  1.89it/s][A[A

TRANING :  61%|██████    | 1121/1844 [09:36<06:27,  1.87it/s][A[A

TRANING :  61%|██████    | 1122/1844 [09:37<06:14,  1.93it/s][A[A

TRANING :  61%|██████    | 1123/1844 [09:37<05:58,  2.01it/s][A[A

TRANING :  61%|██████    | 1124/1844 [09:38<05:55,  2.03it/s][A[A

TRANING :  61%|██████    | 1125/1844 [09:38<05:56,  2.01it/s][A[A

TRANING :  61%|██████    | 1126/1844 [09:38<05:42,  2.10it/s][A[A

TRANING :  61%|██████    | 1127/1844 [09:39<05:21,  2.23it/s][A[A

TRANING :  61%|██████    | 1128/1844 [09:39<05:28,  2.18it/s][A[A

TRANING :  61%|██████    | 1129/1844 [09:40<05:12,  2.29it/s][A[A

TRANING :  61%|██████▏   | 1130/1844 [09:40<04:52,  2.44it/s][A[A

TRANING :  61%|██████▏   | 1131/1844 [09:40<04:53,  2.43it/s][A[A

TRANING :  61%|██████▏   | 1132/1844 [09:41<06:57,  1.70it/s][A[A

TRANING :  61%|██████▏   | 1133/1844 [09:42<06:42,  1.77it/s][A[A

TRANING :  61%|██████▏   | 1134/

epoch 1 step 1159 loss =0.08727 




TRANING :  63%|██████▎   | 1161/1844 [09:58<06:40,  1.70it/s][A[A

TRANING :  63%|██████▎   | 1162/1844 [09:59<07:07,  1.59it/s][A[A

TRANING :  63%|██████▎   | 1163/1844 [09:59<06:56,  1.64it/s][A[A

TRANING :  63%|██████▎   | 1164/1844 [10:00<06:14,  1.82it/s][A[A

TRANING :  63%|██████▎   | 1165/1844 [10:00<05:45,  1.97it/s][A[A

TRANING :  63%|██████▎   | 1166/1844 [10:00<05:36,  2.01it/s][A[A

TRANING :  63%|██████▎   | 1167/1844 [10:01<05:18,  2.13it/s][A[A

TRANING :  63%|██████▎   | 1168/1844 [10:01<05:03,  2.23it/s][A[A

TRANING :  63%|██████▎   | 1169/1844 [10:02<05:41,  1.98it/s][A[A

TRANING :  63%|██████▎   | 1170/1844 [10:03<06:22,  1.76it/s][A[A

TRANING :  64%|██████▎   | 1171/1844 [10:03<06:14,  1.80it/s][A[A

TRANING :  64%|██████▎   | 1172/1844 [10:04<06:09,  1.82it/s][A[A

TRANING :  64%|██████▎   | 1173/1844 [10:04<06:03,  1.84it/s][A[A

TRANING :  64%|██████▎   | 1174/1844 [10:05<05:38,  1.98it/s][A[A

TRANING :  64%|██████▎   | 1175/

epoch 1 step 1199 loss =0.18665 




TRANING :  65%|██████▌   | 1200/1844 [10:18<06:32,  1.64it/s][A[A

TRANING :  65%|██████▌   | 1201/1844 [10:18<05:43,  1.87it/s][A[A

TRANING :  65%|██████▌   | 1202/1844 [10:19<05:42,  1.87it/s][A[A

TRANING :  65%|██████▌   | 1203/1844 [10:19<05:44,  1.86it/s][A[A

TRANING :  65%|██████▌   | 1204/1844 [10:20<05:08,  2.08it/s][A[A

TRANING :  65%|██████▌   | 1205/1844 [10:20<05:10,  2.06it/s][A[A

TRANING :  65%|██████▌   | 1206/1844 [10:20<04:33,  2.33it/s][A[A

TRANING :  65%|██████▌   | 1207/1844 [10:21<04:23,  2.41it/s][A[A

TRANING :  66%|██████▌   | 1208/1844 [10:21<04:47,  2.21it/s][A[A

TRANING :  66%|██████▌   | 1209/1844 [10:22<05:03,  2.09it/s][A[A

TRANING :  66%|██████▌   | 1210/1844 [10:22<05:24,  1.96it/s][A[A

TRANING :  66%|██████▌   | 1211/1844 [10:23<05:13,  2.02it/s][A[A

TRANING :  66%|██████▌   | 1212/1844 [10:24<05:25,  1.94it/s][A[A

TRANING :  66%|██████▌   | 1213/1844 [10:24<05:08,  2.05it/s][A[A

TRANING :  66%|██████▌   | 1214/

epoch 1 step 1239 loss =0.74587 




TRANING :  67%|██████▋   | 1240/1844 [10:38<05:06,  1.97it/s][A[A

TRANING :  67%|██████▋   | 1241/1844 [10:39<05:07,  1.96it/s][A[A

TRANING :  67%|██████▋   | 1242/1844 [10:39<05:18,  1.89it/s][A[A

TRANING :  67%|██████▋   | 1243/1844 [10:40<05:05,  1.97it/s][A[A

TRANING :  67%|██████▋   | 1244/1844 [10:40<05:17,  1.89it/s][A[A

TRANING :  68%|██████▊   | 1245/1844 [10:41<04:57,  2.01it/s][A[A

TRANING :  68%|██████▊   | 1246/1844 [10:42<05:56,  1.68it/s][A[A

TRANING :  68%|██████▊   | 1247/1844 [10:42<05:21,  1.86it/s][A[A

TRANING :  68%|██████▊   | 1248/1844 [10:43<05:31,  1.80it/s][A[A

TRANING :  68%|██████▊   | 1249/1844 [10:43<05:21,  1.85it/s][A[A

TRANING :  68%|██████▊   | 1250/1844 [10:44<05:54,  1.68it/s][A[A

TRANING :  68%|██████▊   | 1251/1844 [10:44<05:49,  1.70it/s][A[A

TRANING :  68%|██████▊   | 1252/1844 [10:45<05:05,  1.94it/s][A[A

TRANING :  68%|██████▊   | 1253/1844 [10:45<04:34,  2.16it/s][A[A

TRANING :  68%|██████▊   | 1254/

epoch 1 step 1279 loss =0.55908 




TRANING :  69%|██████▉   | 1280/1844 [10:57<04:47,  1.96it/s][A[A

TRANING :  69%|██████▉   | 1281/1844 [10:58<04:57,  1.89it/s][A[A

TRANING :  70%|██████▉   | 1282/1844 [10:58<04:54,  1.91it/s][A[A

TRANING :  70%|██████▉   | 1283/1844 [10:59<05:51,  1.60it/s][A[A

TRANING :  70%|██████▉   | 1284/1844 [11:00<05:24,  1.72it/s][A[A

TRANING :  70%|██████▉   | 1285/1844 [11:00<05:05,  1.83it/s][A[A

TRANING :  70%|██████▉   | 1286/1844 [11:01<04:58,  1.87it/s][A[A

TRANING :  70%|██████▉   | 1287/1844 [11:01<04:59,  1.86it/s][A[A

TRANING :  70%|██████▉   | 1288/1844 [11:01<04:26,  2.09it/s][A[A

TRANING :  70%|██████▉   | 1289/1844 [11:02<03:54,  2.37it/s][A[A

TRANING :  70%|██████▉   | 1290/1844 [11:03<04:50,  1.90it/s][A[A

TRANING :  70%|███████   | 1291/1844 [11:03<05:20,  1.72it/s][A[A

TRANING :  70%|███████   | 1292/1844 [11:04<04:59,  1.85it/s][A[A

TRANING :  70%|███████   | 1293/1844 [11:04<04:53,  1.88it/s][A[A

TRANING :  70%|███████   | 1294/

epoch 1 step 1319 loss =0.09427 




TRANING :  72%|███████▏  | 1320/1844 [11:18<03:58,  2.20it/s][A[A

TRANING :  72%|███████▏  | 1321/1844 [11:18<03:36,  2.41it/s][A[A

TRANING :  72%|███████▏  | 1322/1844 [11:19<03:42,  2.34it/s][A[A

TRANING :  72%|███████▏  | 1323/1844 [11:19<03:46,  2.30it/s][A[A

TRANING :  72%|███████▏  | 1324/1844 [11:20<04:09,  2.08it/s][A[A

TRANING :  72%|███████▏  | 1325/1844 [11:20<04:19,  2.00it/s][A[A

TRANING :  72%|███████▏  | 1326/1844 [11:21<05:08,  1.68it/s][A[A

TRANING :  72%|███████▏  | 1327/1844 [11:22<04:40,  1.84it/s][A[A

TRANING :  72%|███████▏  | 1328/1844 [11:22<04:23,  1.96it/s][A[A

TRANING :  72%|███████▏  | 1329/1844 [11:23<04:48,  1.79it/s][A[A

TRANING :  72%|███████▏  | 1330/1844 [11:23<03:59,  2.15it/s][A[A

TRANING :  72%|███████▏  | 1331/1844 [11:23<03:44,  2.28it/s][A[A

TRANING :  72%|███████▏  | 1332/1844 [11:24<04:29,  1.90it/s][A[A

TRANING :  72%|███████▏  | 1333/1844 [11:25<05:06,  1.67it/s][A[A

TRANING :  72%|███████▏  | 1334/

epoch 1 step 1359 loss =0.2547 




TRANING :  74%|███████▍  | 1360/1844 [11:38<03:51,  2.09it/s][A[A

TRANING :  74%|███████▍  | 1361/1844 [11:38<03:50,  2.09it/s][A[A

TRANING :  74%|███████▍  | 1362/1844 [11:39<03:41,  2.17it/s][A[A

TRANING :  74%|███████▍  | 1363/1844 [11:39<03:33,  2.25it/s][A[A

TRANING :  74%|███████▍  | 1364/1844 [11:39<03:22,  2.38it/s][A[A

TRANING :  74%|███████▍  | 1365/1844 [11:40<03:22,  2.37it/s][A[A

TRANING :  74%|███████▍  | 1366/1844 [11:40<03:34,  2.23it/s][A[A

TRANING :  74%|███████▍  | 1367/1844 [11:41<03:20,  2.38it/s][A[A

TRANING :  74%|███████▍  | 1368/1844 [11:41<03:10,  2.50it/s][A[A

TRANING :  74%|███████▍  | 1369/1844 [11:41<03:06,  2.55it/s][A[A

TRANING :  74%|███████▍  | 1370/1844 [11:42<03:13,  2.44it/s][A[A

TRANING :  74%|███████▍  | 1371/1844 [11:42<03:14,  2.43it/s][A[A

TRANING :  74%|███████▍  | 1372/1844 [11:43<03:31,  2.23it/s][A[A

TRANING :  74%|███████▍  | 1373/1844 [11:43<03:55,  2.00it/s][A[A

TRANING :  75%|███████▍  | 1374/

epoch 1 step 1399 loss =0.4743 




TRANING :  76%|███████▌  | 1400/1844 [11:57<03:54,  1.89it/s][A[A

TRANING :  76%|███████▌  | 1401/1844 [11:58<03:55,  1.88it/s][A[A

TRANING :  76%|███████▌  | 1402/1844 [11:58<03:42,  1.99it/s][A[A

TRANING :  76%|███████▌  | 1403/1844 [11:59<03:47,  1.94it/s][A[A

TRANING :  76%|███████▌  | 1404/1844 [11:59<03:47,  1.94it/s][A[A

TRANING :  76%|███████▌  | 1405/1844 [12:00<04:20,  1.68it/s][A[A

TRANING :  76%|███████▌  | 1406/1844 [12:01<03:59,  1.83it/s][A[A

TRANING :  76%|███████▋  | 1407/1844 [12:01<04:00,  1.82it/s][A[A

TRANING :  76%|███████▋  | 1408/1844 [12:02<03:47,  1.92it/s][A[A

TRANING :  76%|███████▋  | 1409/1844 [12:02<03:42,  1.96it/s][A[A

TRANING :  76%|███████▋  | 1410/1844 [12:03<03:56,  1.84it/s][A[A

TRANING :  77%|███████▋  | 1411/1844 [12:03<04:06,  1.76it/s][A[A

TRANING :  77%|███████▋  | 1412/1844 [12:04<03:55,  1.83it/s][A[A

TRANING :  77%|███████▋  | 1413/1844 [12:04<03:59,  1.80it/s][A[A

TRANING :  77%|███████▋  | 1414/

epoch 1 step 1439 loss =0.23635 




TRANING :  78%|███████▊  | 1440/1844 [12:18<03:14,  2.08it/s][A[A

TRANING :  78%|███████▊  | 1441/1844 [12:19<04:00,  1.67it/s][A[A

TRANING :  78%|███████▊  | 1442/1844 [12:20<03:40,  1.82it/s][A[A

TRANING :  78%|███████▊  | 1443/1844 [12:20<03:48,  1.76it/s][A[A

TRANING :  78%|███████▊  | 1444/1844 [12:21<04:03,  1.64it/s][A[A

TRANING :  78%|███████▊  | 1445/1844 [12:22<03:40,  1.81it/s][A[A

TRANING :  78%|███████▊  | 1446/1844 [12:22<03:14,  2.04it/s][A[A

TRANING :  78%|███████▊  | 1447/1844 [12:22<03:27,  1.92it/s][A[A

TRANING :  79%|███████▊  | 1448/1844 [12:23<03:20,  1.97it/s][A[A

TRANING :  79%|███████▊  | 1449/1844 [12:24<03:36,  1.83it/s][A[A

TRANING :  79%|███████▊  | 1450/1844 [12:24<03:25,  1.92it/s][A[A

TRANING :  79%|███████▊  | 1451/1844 [12:25<03:25,  1.91it/s][A[A

TRANING :  79%|███████▊  | 1452/1844 [12:25<03:16,  2.00it/s][A[A

TRANING :  79%|███████▉  | 1453/1844 [12:25<03:09,  2.07it/s][A[A

TRANING :  79%|███████▉  | 1454/

epoch 1 step 1479 loss =0.62736 




TRANING :  80%|████████  | 1480/1844 [12:39<02:48,  2.16it/s][A[A

TRANING :  80%|████████  | 1481/1844 [12:40<02:55,  2.07it/s][A[A

TRANING :  80%|████████  | 1482/1844 [12:40<02:41,  2.24it/s][A[A

TRANING :  80%|████████  | 1483/1844 [12:41<02:40,  2.25it/s][A[A

TRANING :  80%|████████  | 1484/1844 [12:41<02:26,  2.46it/s][A[A

TRANING :  81%|████████  | 1485/1844 [12:41<02:27,  2.43it/s][A[A

TRANING :  81%|████████  | 1486/1844 [12:42<02:28,  2.41it/s][A[A

TRANING :  81%|████████  | 1487/1844 [12:42<02:39,  2.23it/s][A[A

TRANING :  81%|████████  | 1488/1844 [12:43<03:21,  1.77it/s][A[A

TRANING :  81%|████████  | 1489/1844 [12:44<03:12,  1.84it/s][A[A

TRANING :  81%|████████  | 1490/1844 [12:44<03:11,  1.85it/s][A[A

TRANING :  81%|████████  | 1491/1844 [12:45<03:10,  1.86it/s][A[A

TRANING :  81%|████████  | 1492/1844 [12:45<02:55,  2.01it/s][A[A

TRANING :  81%|████████  | 1493/1844 [12:46<02:55,  2.00it/s][A[A

TRANING :  81%|████████  | 1494/

epoch 1 step 1519 loss =0.3372 




TRANING :  82%|████████▏ | 1520/1844 [13:01<02:53,  1.87it/s][A[A

TRANING :  82%|████████▏ | 1521/1844 [13:02<03:00,  1.79it/s][A[A

TRANING :  83%|████████▎ | 1522/1844 [13:02<02:46,  1.94it/s][A[A

TRANING :  83%|████████▎ | 1523/1844 [13:02<02:39,  2.01it/s][A[A

TRANING :  83%|████████▎ | 1524/1844 [13:03<02:37,  2.03it/s][A[A

TRANING :  83%|████████▎ | 1525/1844 [13:03<02:30,  2.12it/s][A[A

TRANING :  83%|████████▎ | 1526/1844 [13:04<02:24,  2.20it/s][A[A

TRANING :  83%|████████▎ | 1527/1844 [13:04<02:15,  2.35it/s][A[A

TRANING :  83%|████████▎ | 1528/1844 [13:05<02:20,  2.25it/s][A[A

TRANING :  83%|████████▎ | 1529/1844 [13:05<02:14,  2.35it/s][A[A

TRANING :  83%|████████▎ | 1530/1844 [13:06<02:50,  1.84it/s][A[A

TRANING :  83%|████████▎ | 1531/1844 [13:06<03:02,  1.72it/s][A[A

TRANING :  83%|████████▎ | 1532/1844 [13:07<02:49,  1.84it/s][A[A

TRANING :  83%|████████▎ | 1533/1844 [13:07<02:26,  2.12it/s][A[A

TRANING :  83%|████████▎ | 1534/

epoch 1 step 1559 loss =0.30518 




TRANING :  85%|████████▍ | 1560/1844 [13:21<02:38,  1.79it/s][A[A

TRANING :  85%|████████▍ | 1561/1844 [13:21<02:18,  2.04it/s][A[A

TRANING :  85%|████████▍ | 1562/1844 [13:21<02:08,  2.19it/s][A[A

TRANING :  85%|████████▍ | 1563/1844 [13:22<02:14,  2.10it/s][A[A

TRANING :  85%|████████▍ | 1564/1844 [13:22<02:18,  2.02it/s][A[A

TRANING :  85%|████████▍ | 1565/1844 [13:23<02:51,  1.62it/s][A[A

TRANING :  85%|████████▍ | 1566/1844 [13:24<02:39,  1.74it/s][A[A

TRANING :  85%|████████▍ | 1567/1844 [13:24<02:29,  1.85it/s][A[A

TRANING :  85%|████████▌ | 1568/1844 [13:25<02:20,  1.97it/s][A[A

TRANING :  85%|████████▌ | 1569/1844 [13:25<02:06,  2.17it/s][A[A

TRANING :  85%|████████▌ | 1570/1844 [13:25<01:55,  2.37it/s][A[A

TRANING :  85%|████████▌ | 1571/1844 [13:26<01:50,  2.46it/s][A[A

TRANING :  85%|████████▌ | 1572/1844 [13:26<02:02,  2.21it/s][A[A

TRANING :  85%|████████▌ | 1573/1844 [13:27<01:53,  2.40it/s][A[A

TRANING :  85%|████████▌ | 1574/

epoch 1 step 1599 loss =0.41754 




TRANING :  87%|████████▋ | 1600/1844 [13:39<02:05,  1.94it/s][A[A

TRANING :  87%|████████▋ | 1601/1844 [13:40<01:59,  2.04it/s][A[A

TRANING :  87%|████████▋ | 1602/1844 [13:40<01:45,  2.29it/s][A[A

TRANING :  87%|████████▋ | 1603/1844 [13:41<02:05,  1.92it/s][A[A

TRANING :  87%|████████▋ | 1604/1844 [13:41<02:07,  1.88it/s][A[A

TRANING :  87%|████████▋ | 1605/1844 [13:42<02:02,  1.95it/s][A[A

TRANING :  87%|████████▋ | 1606/1844 [13:42<01:56,  2.05it/s][A[A

TRANING :  87%|████████▋ | 1607/1844 [13:43<01:55,  2.06it/s][A[A

TRANING :  87%|████████▋ | 1608/1844 [13:43<01:51,  2.11it/s][A[A

TRANING :  87%|████████▋ | 1609/1844 [13:44<01:44,  2.25it/s][A[A

TRANING :  87%|████████▋ | 1610/1844 [13:44<01:55,  2.03it/s][A[A

TRANING :  87%|████████▋ | 1611/1844 [13:45<01:59,  1.96it/s][A[A

TRANING :  87%|████████▋ | 1612/1844 [13:46<02:32,  1.52it/s][A[A

TRANING :  87%|████████▋ | 1613/1844 [13:47<02:52,  1.34it/s][A[A

TRANING :  88%|████████▊ | 1614/

epoch 1 step 1639 loss =0.61365 




TRANING :  89%|████████▉ | 1640/1844 [14:01<01:30,  2.26it/s][A[A

TRANING :  89%|████████▉ | 1641/1844 [14:02<01:31,  2.22it/s][A[A

TRANING :  89%|████████▉ | 1642/1844 [14:02<01:31,  2.22it/s][A[A

TRANING :  89%|████████▉ | 1643/1844 [14:03<01:42,  1.96it/s][A[A

TRANING :  89%|████████▉ | 1644/1844 [14:03<01:43,  1.93it/s][A[A

TRANING :  89%|████████▉ | 1645/1844 [14:04<01:37,  2.04it/s][A[A

TRANING :  89%|████████▉ | 1646/1844 [14:04<01:30,  2.18it/s][A[A

TRANING :  89%|████████▉ | 1647/1844 [14:04<01:25,  2.30it/s][A[A

TRANING :  89%|████████▉ | 1648/1844 [14:05<01:36,  2.02it/s][A[A

TRANING :  89%|████████▉ | 1649/1844 [14:05<01:24,  2.31it/s][A[A

TRANING :  89%|████████▉ | 1650/1844 [14:06<01:29,  2.16it/s][A[A

TRANING :  90%|████████▉ | 1651/1844 [14:06<01:38,  1.95it/s][A[A

TRANING :  90%|████████▉ | 1652/1844 [14:07<01:37,  1.97it/s][A[A

TRANING :  90%|████████▉ | 1653/1844 [14:07<01:26,  2.22it/s][A[A

TRANING :  90%|████████▉ | 1654/

epoch 1 step 1679 loss =0.16934 




TRANING :  91%|█████████ | 1680/1844 [14:21<01:17,  2.11it/s][A[A

TRANING :  91%|█████████ | 1681/1844 [14:21<01:18,  2.09it/s][A[A

TRANING :  91%|█████████ | 1682/1844 [14:22<01:16,  2.12it/s][A[A

TRANING :  91%|█████████▏| 1683/1844 [14:22<01:10,  2.27it/s][A[A

TRANING :  91%|█████████▏| 1684/1844 [14:23<01:10,  2.28it/s][A[A

TRANING :  91%|█████████▏| 1685/1844 [14:23<01:11,  2.23it/s][A[A

TRANING :  91%|█████████▏| 1686/1844 [14:23<01:05,  2.41it/s][A[A

TRANING :  91%|█████████▏| 1687/1844 [14:24<01:08,  2.29it/s][A[A

TRANING :  92%|█████████▏| 1688/1844 [14:24<01:14,  2.10it/s][A[A

TRANING :  92%|█████████▏| 1689/1844 [14:25<01:15,  2.05it/s][A[A

TRANING :  92%|█████████▏| 1690/1844 [14:25<01:18,  1.96it/s][A[A

TRANING :  92%|█████████▏| 1691/1844 [14:26<01:14,  2.06it/s][A[A

TRANING :  92%|█████████▏| 1692/1844 [14:26<01:11,  2.12it/s][A[A

TRANING :  92%|█████████▏| 1693/1844 [14:27<01:11,  2.12it/s][A[A

TRANING :  92%|█████████▏| 1694/

epoch 1 step 1719 loss =0.06261 




TRANING :  93%|█████████▎| 1720/1844 [14:43<01:05,  1.90it/s][A[A

TRANING :  93%|█████████▎| 1721/1844 [14:43<00:59,  2.08it/s][A[A

TRANING :  93%|█████████▎| 1722/1844 [14:43<01:01,  1.99it/s][A[A

TRANING :  93%|█████████▎| 1723/1844 [14:44<00:57,  2.11it/s][A[A

TRANING :  93%|█████████▎| 1724/1844 [14:44<00:58,  2.07it/s][A[A

TRANING :  94%|█████████▎| 1725/1844 [14:45<01:00,  1.96it/s][A[A

TRANING :  94%|█████████▎| 1726/1844 [14:45<01:01,  1.93it/s][A[A

TRANING :  94%|█████████▎| 1727/1844 [14:46<00:58,  1.98it/s][A[A

TRANING :  94%|█████████▎| 1728/1844 [14:47<01:05,  1.77it/s][A[A

TRANING :  94%|█████████▍| 1729/1844 [14:47<01:07,  1.70it/s][A[A

TRANING :  94%|█████████▍| 1730/1844 [14:48<01:09,  1.64it/s][A[A

TRANING :  94%|█████████▍| 1731/1844 [14:49<01:06,  1.70it/s][A[A

TRANING :  94%|█████████▍| 1732/1844 [14:49<00:57,  1.94it/s][A[A

TRANING :  94%|█████████▍| 1733/1844 [14:49<00:54,  2.04it/s][A[A

TRANING :  94%|█████████▍| 1734/

epoch 1 step 1759 loss =0.33312 




TRANING :  95%|█████████▌| 1761/1844 [15:04<00:44,  1.85it/s][A[A

TRANING :  96%|█████████▌| 1762/1844 [15:04<00:42,  1.95it/s][A[A

TRANING :  96%|█████████▌| 1763/1844 [15:05<00:40,  1.98it/s][A[A

TRANING :  96%|█████████▌| 1764/1844 [15:05<00:47,  1.68it/s][A[A

TRANING :  96%|█████████▌| 1765/1844 [15:06<00:41,  1.88it/s][A[A

TRANING :  96%|█████████▌| 1766/1844 [15:06<00:39,  1.96it/s][A[A

TRANING :  96%|█████████▌| 1767/1844 [15:07<00:43,  1.76it/s][A[A

TRANING :  96%|█████████▌| 1768/1844 [15:07<00:39,  1.94it/s][A[A

TRANING :  96%|█████████▌| 1769/1844 [15:08<00:36,  2.07it/s][A[A

TRANING :  96%|█████████▌| 1770/1844 [15:09<00:44,  1.67it/s][A[A

TRANING :  96%|█████████▌| 1771/1844 [15:09<00:45,  1.62it/s][A[A

TRANING :  96%|█████████▌| 1772/1844 [15:10<00:41,  1.74it/s][A[A

TRANING :  96%|█████████▌| 1773/1844 [15:10<00:37,  1.88it/s][A[A

TRANING :  96%|█████████▌| 1774/1844 [15:10<00:32,  2.17it/s][A[A

TRANING :  96%|█████████▋| 1775/

epoch 1 step 1799 loss =0.55094 




TRANING :  98%|█████████▊| 1800/1844 [15:24<00:26,  1.65it/s][A[A

TRANING :  98%|█████████▊| 1801/1844 [15:24<00:22,  1.88it/s][A[A

TRANING :  98%|█████████▊| 1802/1844 [15:25<00:23,  1.77it/s][A[A

TRANING :  98%|█████████▊| 1803/1844 [15:25<00:22,  1.79it/s][A[A

TRANING :  98%|█████████▊| 1804/1844 [15:26<00:22,  1.78it/s][A[A

TRANING :  98%|█████████▊| 1805/1844 [15:27<00:25,  1.53it/s][A[A

TRANING :  98%|█████████▊| 1806/1844 [15:27<00:23,  1.59it/s][A[A

TRANING :  98%|█████████▊| 1807/1844 [15:28<00:26,  1.40it/s][A[A

TRANING :  98%|█████████▊| 1808/1844 [15:29<00:26,  1.35it/s][A[A

TRANING :  98%|█████████▊| 1809/1844 [15:30<00:22,  1.58it/s][A[A

TRANING :  98%|█████████▊| 1810/1844 [15:30<00:18,  1.87it/s][A[A

TRANING :  98%|█████████▊| 1811/1844 [15:30<00:16,  1.95it/s][A[A

TRANING :  98%|█████████▊| 1812/1844 [15:31<00:14,  2.17it/s][A[A

TRANING :  98%|█████████▊| 1813/1844 [15:31<00:12,  2.40it/s][A[A

TRANING :  98%|█████████▊| 1814/

epoch 1 step 1839 loss =0.24669 




TRANING : 100%|█████████▉| 1841/1844 [15:45<00:01,  2.19it/s][A[A

TRANING : 100%|█████████▉| 1842/1844 [15:46<00:01,  1.88it/s][A[A

TRANING : 100%|█████████▉| 1843/1844 [15:46<00:00,  2.06it/s][A[A

TRANING : 100%|██████████| 1844/1844 [15:46<00:00,  2.52it/s][A[A

[A[A

TRANING :   0%|          | 0/1844 [00:00<?, ?it/s][A[A

TRANING :   0%|          | 1/1844 [00:00<16:56,  1.81it/s][A[A

TRANING :   0%|          | 2/1844 [00:01<15:59,  1.92it/s][A[A

TRANING :   0%|          | 3/1844 [00:01<14:11,  2.16it/s][A[A

TRANING :   0%|          | 4/1844 [00:01<14:17,  2.15it/s][A[A

TRANING :   0%|          | 5/1844 [00:02<13:31,  2.27it/s][A[A

TRANING :   0%|          | 6/1844 [00:02<13:32,  2.26it/s][A[A

TRANING :   0%|          | 7/1844 [00:03<14:33,  2.10it/s][A[A

TRANING :   0%|          | 8/1844 [00:04<18:26,  1.66it/s][A[A

TRANING :   0%|          | 9/1844 [00:04<16:08,  1.90it/s][A[A

TRANING :   1%|          | 10/1844 [00:04<14:42,  2.08it/s][A

epoch 2 step 1879 loss =0.09051 




TRANING :   2%|▏         | 36/1844 [00:17<14:39,  2.06it/s][A[A

TRANING :   2%|▏         | 37/1844 [00:17<14:53,  2.02it/s][A[A

TRANING :   2%|▏         | 38/1844 [00:17<12:59,  2.32it/s][A[A

TRANING :   2%|▏         | 39/1844 [00:18<12:48,  2.35it/s][A[A

TRANING :   2%|▏         | 40/1844 [00:18<14:11,  2.12it/s][A[A

TRANING :   2%|▏         | 41/1844 [00:19<14:04,  2.13it/s][A[A

TRANING :   2%|▏         | 42/1844 [00:20<18:16,  1.64it/s][A[A

TRANING :   2%|▏         | 43/1844 [00:20<16:57,  1.77it/s][A[A

TRANING :   2%|▏         | 44/1844 [00:21<17:05,  1.75it/s][A[A

TRANING :   2%|▏         | 45/1844 [00:21<16:35,  1.81it/s][A[A

TRANING :   2%|▏         | 46/1844 [00:22<16:41,  1.80it/s][A[A

TRANING :   3%|▎         | 47/1844 [00:23<16:55,  1.77it/s][A[A

TRANING :   3%|▎         | 48/1844 [00:23<15:28,  1.93it/s][A[A

TRANING :   3%|▎         | 49/1844 [00:23<15:24,  1.94it/s][A[A

TRANING :   3%|▎         | 50/1844 [00:24<15:09,  1.97it/s]

epoch 2 step 1919 loss =0.21955 




TRANING :   4%|▍         | 76/1844 [00:38<14:21,  2.05it/s][A[A

TRANING :   4%|▍         | 77/1844 [00:39<15:28,  1.90it/s][A[A

TRANING :   4%|▍         | 78/1844 [00:39<15:01,  1.96it/s][A[A

TRANING :   4%|▍         | 79/1844 [00:40<15:22,  1.91it/s][A[A

TRANING :   4%|▍         | 80/1844 [00:40<16:03,  1.83it/s][A[A

TRANING :   4%|▍         | 81/1844 [00:41<13:59,  2.10it/s][A[A

TRANING :   4%|▍         | 82/1844 [00:41<14:26,  2.03it/s][A[A

TRANING :   5%|▍         | 83/1844 [00:41<13:01,  2.25it/s][A[A

TRANING :   5%|▍         | 84/1844 [00:42<12:34,  2.33it/s][A[A

TRANING :   5%|▍         | 85/1844 [00:42<13:15,  2.21it/s][A[A

TRANING :   5%|▍         | 86/1844 [00:43<13:22,  2.19it/s][A[A

TRANING :   5%|▍         | 87/1844 [00:43<13:19,  2.20it/s][A[A

TRANING :   5%|▍         | 88/1844 [00:44<12:14,  2.39it/s][A[A

TRANING :   5%|▍         | 89/1844 [00:44<11:20,  2.58it/s][A[A

TRANING :   5%|▍         | 90/1844 [00:44<10:48,  2.70it/s]

epoch 2 step 1959 loss =0.65686 




TRANING :   6%|▋         | 117/1844 [00:58<12:33,  2.29it/s][A[A

TRANING :   6%|▋         | 118/1844 [00:59<11:37,  2.47it/s][A[A

TRANING :   6%|▋         | 119/1844 [00:59<11:17,  2.55it/s][A[A

TRANING :   7%|▋         | 120/1844 [00:59<11:18,  2.54it/s][A[A

TRANING :   7%|▋         | 121/1844 [01:00<12:30,  2.29it/s][A[A

TRANING :   7%|▋         | 122/1844 [01:00<12:49,  2.24it/s][A[A

TRANING :   7%|▋         | 123/1844 [01:01<12:53,  2.23it/s][A[A

TRANING :   7%|▋         | 124/1844 [01:01<13:30,  2.12it/s][A[A

TRANING :   7%|▋         | 125/1844 [01:02<13:19,  2.15it/s][A[A

TRANING :   7%|▋         | 126/1844 [01:02<13:52,  2.06it/s][A[A

TRANING :   7%|▋         | 127/1844 [01:03<14:40,  1.95it/s][A[A

TRANING :   7%|▋         | 128/1844 [01:03<14:52,  1.92it/s][A[A

TRANING :   7%|▋         | 129/1844 [01:04<13:08,  2.18it/s][A[A

TRANING :   7%|▋         | 130/1844 [01:04<14:43,  1.94it/s][A[A

TRANING :   7%|▋         | 131/1844 [01:05<14:

epoch 2 step 1999 loss =0.20768 




TRANING :   8%|▊         | 156/1844 [01:18<13:08,  2.14it/s][A[A

TRANING :   9%|▊         | 157/1844 [01:19<15:04,  1.87it/s][A[A

TRANING :   9%|▊         | 158/1844 [01:19<15:05,  1.86it/s][A[A

TRANING :   9%|▊         | 159/1844 [01:20<15:02,  1.87it/s][A[A

TRANING :   9%|▊         | 160/1844 [01:21<16:23,  1.71it/s][A[A

TRANING :   9%|▊         | 161/1844 [01:21<18:00,  1.56it/s][A[A

TRANING :   9%|▉         | 162/1844 [01:22<16:47,  1.67it/s][A[A

TRANING :   9%|▉         | 163/1844 [01:23<16:56,  1.65it/s][A[A

TRANING :   9%|▉         | 164/1844 [01:23<15:54,  1.76it/s][A[A

TRANING :   9%|▉         | 165/1844 [01:23<14:34,  1.92it/s][A[A

TRANING :   9%|▉         | 166/1844 [01:24<15:06,  1.85it/s][A[A

TRANING :   9%|▉         | 167/1844 [01:25<15:50,  1.76it/s][A[A

TRANING :   9%|▉         | 168/1844 [01:25<16:56,  1.65it/s][A[A

TRANING :   9%|▉         | 169/1844 [01:26<15:44,  1.77it/s][A[A

TRANING :   9%|▉         | 170/1844 [01:26<13:

epoch 2 step 2039 loss =0.07784 




TRANING :  11%|█         | 196/1844 [01:40<15:20,  1.79it/s][A[A

TRANING :  11%|█         | 197/1844 [01:40<13:37,  2.02it/s][A[A

TRANING :  11%|█         | 198/1844 [01:41<12:38,  2.17it/s][A[A

TRANING :  11%|█         | 199/1844 [01:42<20:07,  1.36it/s][A[A

TRANING :  11%|█         | 200/1844 [01:42<16:48,  1.63it/s][A[A

TRANING :  11%|█         | 201/1844 [01:43<14:27,  1.89it/s][A[A

TRANING :  11%|█         | 202/1844 [01:43<14:38,  1.87it/s][A[A

TRANING :  11%|█         | 203/1844 [01:44<16:03,  1.70it/s][A[A

TRANING :  11%|█         | 204/1844 [01:44<14:56,  1.83it/s][A[A

TRANING :  11%|█         | 205/1844 [01:45<14:33,  1.88it/s][A[A

TRANING :  11%|█         | 206/1844 [01:45<14:50,  1.84it/s][A[A

TRANING :  11%|█         | 207/1844 [01:46<14:40,  1.86it/s][A[A

TRANING :  11%|█▏        | 208/1844 [01:46<13:33,  2.01it/s][A[A

TRANING :  11%|█▏        | 209/1844 [01:47<13:30,  2.02it/s][A[A

TRANING :  11%|█▏        | 210/1844 [01:47<14:

epoch 2 step 2079 loss =0.16432 




TRANING :  13%|█▎        | 236/1844 [02:02<12:23,  2.16it/s][A[A

TRANING :  13%|█▎        | 237/1844 [02:02<14:48,  1.81it/s][A[A

TRANING :  13%|█▎        | 238/1844 [02:03<16:12,  1.65it/s][A[A

TRANING :  13%|█▎        | 239/1844 [02:04<16:33,  1.61it/s][A[A

TRANING :  13%|█▎        | 240/1844 [02:04<15:21,  1.74it/s][A[A

TRANING :  13%|█▎        | 241/1844 [02:05<13:48,  1.94it/s][A[A

TRANING :  13%|█▎        | 242/1844 [02:05<13:50,  1.93it/s][A[A

TRANING :  13%|█▎        | 243/1844 [02:06<13:00,  2.05it/s][A[A

TRANING :  13%|█▎        | 244/1844 [02:06<13:34,  1.96it/s][A[A

TRANING :  13%|█▎        | 245/1844 [02:07<12:59,  2.05it/s][A[A

TRANING :  13%|█▎        | 246/1844 [02:07<12:30,  2.13it/s][A[A

TRANING :  13%|█▎        | 247/1844 [02:08<13:03,  2.04it/s][A[A

TRANING :  13%|█▎        | 248/1844 [02:08<13:05,  2.03it/s][A[A

TRANING :  14%|█▎        | 249/1844 [02:09<13:00,  2.04it/s][A[A

TRANING :  14%|█▎        | 250/1844 [02:09<13:

epoch 2 step 2119 loss =0.30132 




TRANING :  15%|█▍        | 276/1844 [02:22<16:07,  1.62it/s][A[A

TRANING :  15%|█▌        | 277/1844 [02:23<13:45,  1.90it/s][A[A

TRANING :  15%|█▌        | 278/1844 [02:23<13:30,  1.93it/s][A[A

TRANING :  15%|█▌        | 279/1844 [02:24<12:57,  2.01it/s][A[A

TRANING :  15%|█▌        | 280/1844 [02:24<12:45,  2.04it/s][A[A

TRANING :  15%|█▌        | 281/1844 [02:25<11:53,  2.19it/s][A[A

TRANING :  15%|█▌        | 282/1844 [02:25<11:01,  2.36it/s][A[A

TRANING :  15%|█▌        | 283/1844 [02:26<13:14,  1.96it/s][A[A

TRANING :  15%|█▌        | 284/1844 [02:26<13:38,  1.91it/s][A[A

TRANING :  15%|█▌        | 285/1844 [02:27<13:44,  1.89it/s][A[A

TRANING :  16%|█▌        | 286/1844 [02:27<15:47,  1.64it/s][A[A

TRANING :  16%|█▌        | 287/1844 [02:28<15:00,  1.73it/s][A[A

TRANING :  16%|█▌        | 288/1844 [02:29<14:50,  1.75it/s][A[A

TRANING :  16%|█▌        | 289/1844 [02:29<14:06,  1.84it/s][A[A

TRANING :  16%|█▌        | 290/1844 [02:30<14:

epoch 2 step 2159 loss =0.3019 




TRANING :  17%|█▋        | 316/1844 [02:43<12:18,  2.07it/s][A[A

TRANING :  17%|█▋        | 317/1844 [02:44<14:31,  1.75it/s][A[A

TRANING :  17%|█▋        | 318/1844 [02:44<12:41,  2.00it/s][A[A

TRANING :  17%|█▋        | 319/1844 [02:45<12:19,  2.06it/s][A[A

TRANING :  17%|█▋        | 320/1844 [02:45<11:54,  2.13it/s][A[A

TRANING :  17%|█▋        | 321/1844 [02:46<12:39,  2.00it/s][A[A

TRANING :  17%|█▋        | 322/1844 [02:46<12:56,  1.96it/s][A[A

TRANING :  18%|█▊        | 323/1844 [02:47<12:01,  2.11it/s][A[A

TRANING :  18%|█▊        | 324/1844 [02:47<12:21,  2.05it/s][A[A

TRANING :  18%|█▊        | 325/1844 [02:48<11:31,  2.20it/s][A[A

TRANING :  18%|█▊        | 326/1844 [02:48<11:49,  2.14it/s][A[A

TRANING :  18%|█▊        | 327/1844 [02:49<13:57,  1.81it/s][A[A

TRANING :  18%|█▊        | 328/1844 [02:49<12:51,  1.97it/s][A[A

TRANING :  18%|█▊        | 329/1844 [02:50<11:50,  2.13it/s][A[A

TRANING :  18%|█▊        | 330/1844 [02:50<12:

epoch 2 step 2199 loss =0.16951 




TRANING :  19%|█▉        | 356/1844 [03:03<10:16,  2.41it/s][A[A

TRANING :  19%|█▉        | 357/1844 [03:04<10:14,  2.42it/s][A[A

TRANING :  19%|█▉        | 358/1844 [03:04<10:56,  2.27it/s][A[A

TRANING :  19%|█▉        | 359/1844 [03:04<11:06,  2.23it/s][A[A

TRANING :  20%|█▉        | 360/1844 [03:05<11:43,  2.11it/s][A[A

TRANING :  20%|█▉        | 361/1844 [03:06<12:17,  2.01it/s][A[A

TRANING :  20%|█▉        | 362/1844 [03:06<12:01,  2.05it/s][A[A

TRANING :  20%|█▉        | 363/1844 [03:07<12:29,  1.98it/s][A[A

TRANING :  20%|█▉        | 364/1844 [03:07<12:00,  2.05it/s][A[A

TRANING :  20%|█▉        | 365/1844 [03:08<13:11,  1.87it/s][A[A

TRANING :  20%|█▉        | 366/1844 [03:08<12:04,  2.04it/s][A[A

TRANING :  20%|█▉        | 367/1844 [03:08<11:26,  2.15it/s][A[A

TRANING :  20%|█▉        | 368/1844 [03:09<10:33,  2.33it/s][A[A

TRANING :  20%|██        | 369/1844 [03:09<10:10,  2.42it/s][A[A

TRANING :  20%|██        | 370/1844 [03:10<09:

epoch 2 step 2239 loss =0.40292 




TRANING :  21%|██▏       | 396/1844 [03:22<12:10,  1.98it/s][A[A

TRANING :  22%|██▏       | 397/1844 [03:22<11:45,  2.05it/s][A[A

TRANING :  22%|██▏       | 398/1844 [03:23<11:42,  2.06it/s][A[A

TRANING :  22%|██▏       | 399/1844 [03:23<12:26,  1.94it/s][A[A

TRANING :  22%|██▏       | 400/1844 [03:24<12:43,  1.89it/s][A[A

TRANING :  22%|██▏       | 401/1844 [03:25<13:47,  1.74it/s][A[A

TRANING :  22%|██▏       | 402/1844 [03:25<15:33,  1.55it/s][A[A

TRANING :  22%|██▏       | 403/1844 [03:26<16:24,  1.46it/s][A[A

TRANING :  22%|██▏       | 404/1844 [03:27<14:02,  1.71it/s][A[A

TRANING :  22%|██▏       | 405/1844 [03:27<13:23,  1.79it/s][A[A

TRANING :  22%|██▏       | 406/1844 [03:28<13:46,  1.74it/s][A[A

TRANING :  22%|██▏       | 407/1844 [03:28<12:58,  1.85it/s][A[A

TRANING :  22%|██▏       | 408/1844 [03:28<11:25,  2.10it/s][A[A

TRANING :  22%|██▏       | 409/1844 [03:29<10:16,  2.33it/s][A[A

TRANING :  22%|██▏       | 410/1844 [03:29<09:

epoch 2 step 2279 loss =0.56203 




TRANING :  24%|██▎       | 436/1844 [03:44<12:54,  1.82it/s][A[A

TRANING :  24%|██▎       | 437/1844 [03:45<12:28,  1.88it/s][A[A

TRANING :  24%|██▍       | 438/1844 [03:45<12:19,  1.90it/s][A[A

TRANING :  24%|██▍       | 439/1844 [03:46<14:35,  1.60it/s][A[A

TRANING :  24%|██▍       | 440/1844 [03:47<13:55,  1.68it/s][A[A

TRANING :  24%|██▍       | 441/1844 [03:47<12:10,  1.92it/s][A[A

TRANING :  24%|██▍       | 442/1844 [03:47<11:29,  2.03it/s][A[A

TRANING :  24%|██▍       | 443/1844 [03:48<10:31,  2.22it/s][A[A

TRANING :  24%|██▍       | 444/1844 [03:48<09:35,  2.43it/s][A[A

TRANING :  24%|██▍       | 445/1844 [03:48<09:36,  2.42it/s][A[A

TRANING :  24%|██▍       | 446/1844 [03:49<11:28,  2.03it/s][A[A

TRANING :  24%|██▍       | 447/1844 [03:50<13:47,  1.69it/s][A[A

TRANING :  24%|██▍       | 448/1844 [03:51<13:52,  1.68it/s][A[A

TRANING :  24%|██▍       | 449/1844 [03:51<13:51,  1.68it/s][A[A

TRANING :  24%|██▍       | 450/1844 [03:52<13:

epoch 2 step 2319 loss =0.38768 




TRANING :  26%|██▌       | 476/1844 [04:03<13:15,  1.72it/s][A[A

TRANING :  26%|██▌       | 477/1844 [04:04<12:37,  1.80it/s][A[A

TRANING :  26%|██▌       | 478/1844 [04:05<14:05,  1.62it/s][A[A

TRANING :  26%|██▌       | 479/1844 [04:05<14:05,  1.61it/s][A[A

TRANING :  26%|██▌       | 480/1844 [04:06<13:30,  1.68it/s][A[A

TRANING :  26%|██▌       | 481/1844 [04:06<11:40,  1.95it/s][A[A

TRANING :  26%|██▌       | 482/1844 [04:07<11:11,  2.03it/s][A[A

TRANING :  26%|██▌       | 483/1844 [04:07<12:57,  1.75it/s][A[A

TRANING :  26%|██▌       | 484/1844 [04:08<11:35,  1.96it/s][A[A

TRANING :  26%|██▋       | 485/1844 [04:08<10:42,  2.11it/s][A[A

TRANING :  26%|██▋       | 486/1844 [04:09<10:56,  2.07it/s][A[A

TRANING :  26%|██▋       | 487/1844 [04:09<11:38,  1.94it/s][A[A

TRANING :  26%|██▋       | 488/1844 [04:10<11:12,  2.01it/s][A[A

TRANING :  27%|██▋       | 489/1844 [04:10<10:27,  2.16it/s][A[A

TRANING :  27%|██▋       | 490/1844 [04:10<10:

epoch 2 step 2359 loss =0.27601 




TRANING :  28%|██▊       | 516/1844 [04:24<09:55,  2.23it/s][A[A

TRANING :  28%|██▊       | 517/1844 [04:25<11:25,  1.94it/s][A[A

TRANING :  28%|██▊       | 518/1844 [04:25<12:19,  1.79it/s][A[A

TRANING :  28%|██▊       | 519/1844 [04:26<11:53,  1.86it/s][A[A

TRANING :  28%|██▊       | 520/1844 [04:27<12:29,  1.77it/s][A[A

TRANING :  28%|██▊       | 521/1844 [04:27<12:12,  1.81it/s][A[A

TRANING :  28%|██▊       | 522/1844 [04:28<12:42,  1.73it/s][A[A

TRANING :  28%|██▊       | 523/1844 [04:28<11:18,  1.95it/s][A[A

TRANING :  28%|██▊       | 524/1844 [04:29<11:49,  1.86it/s][A[A

TRANING :  28%|██▊       | 525/1844 [04:29<11:24,  1.93it/s][A[A

TRANING :  29%|██▊       | 526/1844 [04:30<12:00,  1.83it/s][A[A

TRANING :  29%|██▊       | 527/1844 [04:30<11:11,  1.96it/s][A[A

TRANING :  29%|██▊       | 528/1844 [04:31<10:03,  2.18it/s][A[A

TRANING :  29%|██▊       | 529/1844 [04:31<10:30,  2.09it/s][A[A

TRANING :  29%|██▊       | 530/1844 [04:32<10:

epoch 2 step 2399 loss =0.05884 




TRANING :  30%|███       | 556/1844 [04:45<09:18,  2.31it/s][A[A

TRANING :  30%|███       | 557/1844 [04:45<08:29,  2.53it/s][A[A

TRANING :  30%|███       | 558/1844 [04:46<11:17,  1.90it/s][A[A

TRANING :  30%|███       | 559/1844 [04:46<10:56,  1.96it/s][A[A

TRANING :  30%|███       | 560/1844 [04:47<10:45,  1.99it/s][A[A

TRANING :  30%|███       | 561/1844 [04:47<10:00,  2.14it/s][A[A

TRANING :  30%|███       | 562/1844 [04:48<09:51,  2.17it/s][A[A

TRANING :  31%|███       | 563/1844 [04:48<10:24,  2.05it/s][A[A

TRANING :  31%|███       | 564/1844 [04:48<09:27,  2.25it/s][A[A

TRANING :  31%|███       | 565/1844 [04:49<09:15,  2.30it/s][A[A

TRANING :  31%|███       | 566/1844 [04:49<09:19,  2.29it/s][A[A

TRANING :  31%|███       | 567/1844 [04:50<09:58,  2.13it/s][A[A

TRANING :  31%|███       | 568/1844 [04:50<10:19,  2.06it/s][A[A

TRANING :  31%|███       | 569/1844 [04:51<09:20,  2.28it/s][A[A

TRANING :  31%|███       | 570/1844 [04:51<09:

epoch 2 step 2439 loss =0.24889 




TRANING :  32%|███▏      | 596/1844 [05:05<10:30,  1.98it/s][A[A

TRANING :  32%|███▏      | 597/1844 [05:06<10:43,  1.94it/s][A[A

TRANING :  32%|███▏      | 598/1844 [05:06<10:42,  1.94it/s][A[A

TRANING :  32%|███▏      | 599/1844 [05:07<10:37,  1.95it/s][A[A

TRANING :  33%|███▎      | 600/1844 [05:07<10:55,  1.90it/s][A[A

TRANING :  33%|███▎      | 601/1844 [05:08<10:39,  1.94it/s][A[A

TRANING :  33%|███▎      | 602/1844 [05:08<10:15,  2.02it/s][A[A

TRANING :  33%|███▎      | 603/1844 [05:09<13:02,  1.59it/s][A[A

TRANING :  33%|███▎      | 604/1844 [05:10<11:58,  1.73it/s][A[A

TRANING :  33%|███▎      | 605/1844 [05:10<12:34,  1.64it/s][A[A

TRANING :  33%|███▎      | 606/1844 [05:11<14:55,  1.38it/s][A[A

TRANING :  33%|███▎      | 607/1844 [05:12<13:57,  1.48it/s][A[A

TRANING :  33%|███▎      | 608/1844 [05:12<12:59,  1.58it/s][A[A

TRANING :  33%|███▎      | 609/1844 [05:13<11:27,  1.80it/s][A[A

TRANING :  33%|███▎      | 610/1844 [05:13<12:

epoch 2 step 2479 loss =0.21337 




TRANING :  34%|███▍      | 636/1844 [05:27<12:10,  1.65it/s][A[A

TRANING :  35%|███▍      | 637/1844 [05:27<12:41,  1.58it/s][A[A

TRANING :  35%|███▍      | 638/1844 [05:28<11:34,  1.74it/s][A[A

TRANING :  35%|███▍      | 639/1844 [05:28<10:32,  1.91it/s][A[A

TRANING :  35%|███▍      | 640/1844 [05:29<11:10,  1.80it/s][A[A

TRANING :  35%|███▍      | 641/1844 [05:29<11:04,  1.81it/s][A[A

TRANING :  35%|███▍      | 642/1844 [05:30<13:06,  1.53it/s][A[A

TRANING :  35%|███▍      | 643/1844 [05:31<11:56,  1.68it/s][A[A

TRANING :  35%|███▍      | 644/1844 [05:31<12:00,  1.67it/s][A[A

TRANING :  35%|███▍      | 645/1844 [05:32<10:23,  1.92it/s][A[A

TRANING :  35%|███▌      | 646/1844 [05:32<10:42,  1.86it/s][A[A

TRANING :  35%|███▌      | 647/1844 [05:33<09:41,  2.06it/s][A[A

TRANING :  35%|███▌      | 648/1844 [05:33<09:24,  2.12it/s][A[A

TRANING :  35%|███▌      | 649/1844 [05:34<09:33,  2.08it/s][A[A

TRANING :  35%|███▌      | 650/1844 [05:34<11:

epoch 2 step 2519 loss =0.45859 




TRANING :  37%|███▋      | 676/1844 [05:46<09:52,  1.97it/s][A[A

TRANING :  37%|███▋      | 677/1844 [05:47<09:39,  2.02it/s][A[A

TRANING :  37%|███▋      | 678/1844 [05:47<08:29,  2.29it/s][A[A

TRANING :  37%|███▋      | 679/1844 [05:47<07:56,  2.44it/s][A[A

TRANING :  37%|███▋      | 680/1844 [05:48<10:26,  1.86it/s][A[A

TRANING :  37%|███▋      | 681/1844 [05:49<10:35,  1.83it/s][A[A

TRANING :  37%|███▋      | 682/1844 [05:49<09:54,  1.95it/s][A[A

TRANING :  37%|███▋      | 683/1844 [05:49<09:07,  2.12it/s][A[A

TRANING :  37%|███▋      | 684/1844 [05:50<09:39,  2.00it/s][A[A

TRANING :  37%|███▋      | 685/1844 [05:51<13:50,  1.40it/s][A[A

TRANING :  37%|███▋      | 686/1844 [05:52<12:37,  1.53it/s][A[A

TRANING :  37%|███▋      | 687/1844 [05:52<11:10,  1.73it/s][A[A

TRANING :  37%|███▋      | 688/1844 [05:53<09:42,  1.99it/s][A[A

TRANING :  37%|███▋      | 689/1844 [05:53<09:51,  1.95it/s][A[A

TRANING :  37%|███▋      | 690/1844 [05:54<09:

epoch 2 step 2559 loss =0.15922 




TRANING :  39%|███▉      | 716/1844 [06:06<09:32,  1.97it/s][A[A

TRANING :  39%|███▉      | 717/1844 [06:06<08:51,  2.12it/s][A[A

TRANING :  39%|███▉      | 718/1844 [06:07<09:36,  1.95it/s][A[A

TRANING :  39%|███▉      | 719/1844 [06:07<09:17,  2.02it/s][A[A

TRANING :  39%|███▉      | 720/1844 [06:08<10:36,  1.77it/s][A[A

TRANING :  39%|███▉      | 721/1844 [06:08<10:10,  1.84it/s][A[A

TRANING :  39%|███▉      | 722/1844 [06:09<08:59,  2.08it/s][A[A

TRANING :  39%|███▉      | 723/1844 [06:09<09:32,  1.96it/s][A[A

TRANING :  39%|███▉      | 724/1844 [06:10<09:47,  1.91it/s][A[A

TRANING :  39%|███▉      | 725/1844 [06:10<10:09,  1.84it/s][A[A

TRANING :  39%|███▉      | 726/1844 [06:11<08:58,  2.07it/s][A[A

TRANING :  39%|███▉      | 727/1844 [06:12<11:19,  1.64it/s][A[A

TRANING :  39%|███▉      | 728/1844 [06:12<10:22,  1.79it/s][A[A

TRANING :  40%|███▉      | 729/1844 [06:12<09:19,  1.99it/s][A[A

TRANING :  40%|███▉      | 730/1844 [06:13<10:

epoch 2 step 2599 loss =0.15491 




TRANING :  41%|████      | 756/1844 [06:26<09:42,  1.87it/s][A[A

TRANING :  41%|████      | 757/1844 [06:26<09:30,  1.90it/s][A[A

TRANING :  41%|████      | 758/1844 [06:27<08:59,  2.01it/s][A[A

TRANING :  41%|████      | 759/1844 [06:27<08:46,  2.06it/s][A[A

TRANING :  41%|████      | 760/1844 [06:28<08:29,  2.13it/s][A[A

TRANING :  41%|████▏     | 761/1844 [06:28<08:24,  2.14it/s][A[A

TRANING :  41%|████▏     | 762/1844 [06:29<09:44,  1.85it/s][A[A

TRANING :  41%|████▏     | 763/1844 [06:30<11:07,  1.62it/s][A[A

TRANING :  41%|████▏     | 764/1844 [06:30<10:25,  1.73it/s][A[A

TRANING :  41%|████▏     | 765/1844 [06:31<10:27,  1.72it/s][A[A

TRANING :  42%|████▏     | 766/1844 [06:31<09:47,  1.84it/s][A[A

TRANING :  42%|████▏     | 767/1844 [06:32<08:44,  2.05it/s][A[A

TRANING :  42%|████▏     | 768/1844 [06:32<10:43,  1.67it/s][A[A

TRANING :  42%|████▏     | 769/1844 [06:33<10:33,  1.70it/s][A[A

TRANING :  42%|████▏     | 770/1844 [06:33<09:

epoch 2 step 2639 loss =0.26524 




TRANING :  43%|████▎     | 796/1844 [06:47<08:28,  2.06it/s][A[A

TRANING :  43%|████▎     | 797/1844 [06:47<08:42,  2.00it/s][A[A

TRANING :  43%|████▎     | 798/1844 [06:47<08:07,  2.15it/s][A[A

TRANING :  43%|████▎     | 799/1844 [06:48<08:48,  1.98it/s][A[A

TRANING :  43%|████▎     | 800/1844 [06:49<09:50,  1.77it/s][A[A

TRANING :  43%|████▎     | 801/1844 [06:49<09:49,  1.77it/s][A[A

TRANING :  43%|████▎     | 802/1844 [06:50<09:09,  1.90it/s][A[A

TRANING :  44%|████▎     | 803/1844 [06:51<11:39,  1.49it/s][A[A

TRANING :  44%|████▎     | 804/1844 [06:51<09:55,  1.75it/s][A[A

TRANING :  44%|████▎     | 805/1844 [06:52<09:37,  1.80it/s][A[A

TRANING :  44%|████▎     | 806/1844 [06:52<09:00,  1.92it/s][A[A

TRANING :  44%|████▍     | 807/1844 [06:53<10:10,  1.70it/s][A[A

TRANING :  44%|████▍     | 808/1844 [06:53<09:38,  1.79it/s][A[A

TRANING :  44%|████▍     | 809/1844 [06:54<09:04,  1.90it/s][A[A

TRANING :  44%|████▍     | 810/1844 [06:54<08:

epoch 2 step 2679 loss =0.65986 




TRANING :  45%|████▌     | 836/1844 [07:08<10:53,  1.54it/s][A[A

TRANING :  45%|████▌     | 837/1844 [07:08<09:44,  1.72it/s][A[A

TRANING :  45%|████▌     | 838/1844 [07:09<09:52,  1.70it/s][A[A

TRANING :  45%|████▌     | 839/1844 [07:09<08:46,  1.91it/s][A[A

TRANING :  46%|████▌     | 840/1844 [07:10<08:33,  1.96it/s][A[A

TRANING :  46%|████▌     | 841/1844 [07:10<09:14,  1.81it/s][A[A

TRANING :  46%|████▌     | 842/1844 [07:11<09:18,  1.79it/s][A[A

TRANING :  46%|████▌     | 843/1844 [07:12<08:52,  1.88it/s][A[A

TRANING :  46%|████▌     | 844/1844 [07:12<09:07,  1.83it/s][A[A

TRANING :  46%|████▌     | 845/1844 [07:13<09:50,  1.69it/s][A[A

TRANING :  46%|████▌     | 846/1844 [07:13<09:32,  1.74it/s][A[A

TRANING :  46%|████▌     | 847/1844 [07:14<10:45,  1.54it/s][A[A

TRANING :  46%|████▌     | 848/1844 [07:15<10:19,  1.61it/s][A[A

TRANING :  46%|████▌     | 849/1844 [07:15<09:08,  1.81it/s][A[A

TRANING :  46%|████▌     | 850/1844 [07:16<09:

epoch 2 step 2719 loss =0.24213 




TRANING :  48%|████▊     | 876/1844 [07:30<09:58,  1.62it/s][A[A

TRANING :  48%|████▊     | 877/1844 [07:31<09:58,  1.62it/s][A[A

TRANING :  48%|████▊     | 878/1844 [07:32<09:02,  1.78it/s][A[A

TRANING :  48%|████▊     | 879/1844 [07:32<08:43,  1.84it/s][A[A

TRANING :  48%|████▊     | 880/1844 [07:32<08:15,  1.94it/s][A[A

TRANING :  48%|████▊     | 881/1844 [07:33<08:09,  1.97it/s][A[A

TRANING :  48%|████▊     | 882/1844 [07:34<09:02,  1.77it/s][A[A

TRANING :  48%|████▊     | 883/1844 [07:34<08:53,  1.80it/s][A[A

TRANING :  48%|████▊     | 884/1844 [07:35<08:31,  1.88it/s][A[A

TRANING :  48%|████▊     | 885/1844 [07:36<10:29,  1.52it/s][A[A

TRANING :  48%|████▊     | 886/1844 [07:36<09:25,  1.69it/s][A[A

TRANING :  48%|████▊     | 887/1844 [07:36<08:31,  1.87it/s][A[A

TRANING :  48%|████▊     | 888/1844 [07:37<07:43,  2.06it/s][A[A

TRANING :  48%|████▊     | 889/1844 [07:37<07:27,  2.14it/s][A[A

TRANING :  48%|████▊     | 890/1844 [07:38<08:

epoch 2 step 2759 loss =0.56727 




TRANING :  50%|████▉     | 916/1844 [07:51<08:20,  1.86it/s][A[A

TRANING :  50%|████▉     | 917/1844 [07:52<07:22,  2.10it/s][A[A

TRANING :  50%|████▉     | 918/1844 [07:52<07:10,  2.15it/s][A[A

TRANING :  50%|████▉     | 919/1844 [07:53<08:30,  1.81it/s][A[A

TRANING :  50%|████▉     | 920/1844 [07:53<08:34,  1.80it/s][A[A

TRANING :  50%|████▉     | 921/1844 [07:54<08:00,  1.92it/s][A[A

TRANING :  50%|█████     | 922/1844 [07:54<07:55,  1.94it/s][A[A

TRANING :  50%|█████     | 923/1844 [07:55<08:58,  1.71it/s][A[A

TRANING :  50%|█████     | 924/1844 [07:56<08:26,  1.81it/s][A[A

TRANING :  50%|█████     | 925/1844 [07:56<07:54,  1.94it/s][A[A

TRANING :  50%|█████     | 926/1844 [07:56<06:46,  2.26it/s][A[A

TRANING :  50%|█████     | 927/1844 [07:57<06:49,  2.24it/s][A[A

TRANING :  50%|█████     | 928/1844 [07:57<07:00,  2.18it/s][A[A

TRANING :  50%|█████     | 929/1844 [07:58<08:42,  1.75it/s][A[A

TRANING :  50%|█████     | 930/1844 [07:59<09:

epoch 2 step 2799 loss =0.31199 




TRANING :  52%|█████▏    | 956/1844 [08:11<06:57,  2.13it/s][A[A

TRANING :  52%|█████▏    | 957/1844 [08:12<06:50,  2.16it/s][A[A

TRANING :  52%|█████▏    | 958/1844 [08:12<06:18,  2.34it/s][A[A

TRANING :  52%|█████▏    | 959/1844 [08:12<06:50,  2.16it/s][A[A

TRANING :  52%|█████▏    | 960/1844 [08:13<07:09,  2.06it/s][A[A

TRANING :  52%|█████▏    | 961/1844 [08:13<06:45,  2.18it/s][A[A

TRANING :  52%|█████▏    | 962/1844 [08:14<08:45,  1.68it/s][A[A

TRANING :  52%|█████▏    | 963/1844 [08:15<08:17,  1.77it/s][A[A

TRANING :  52%|█████▏    | 964/1844 [08:15<08:16,  1.77it/s][A[A

TRANING :  52%|█████▏    | 965/1844 [08:16<07:50,  1.87it/s][A[A

TRANING :  52%|█████▏    | 966/1844 [08:16<07:19,  2.00it/s][A[A

TRANING :  52%|█████▏    | 967/1844 [08:17<07:05,  2.06it/s][A[A

TRANING :  52%|█████▏    | 968/1844 [08:17<07:22,  1.98it/s][A[A

TRANING :  53%|█████▎    | 969/1844 [08:18<07:32,  1.93it/s][A[A

TRANING :  53%|█████▎    | 970/1844 [08:18<07:

epoch 2 step 2839 loss =0.31337 




TRANING :  54%|█████▍    | 996/1844 [08:32<06:49,  2.07it/s][A[A

TRANING :  54%|█████▍    | 997/1844 [08:32<06:33,  2.15it/s][A[A

TRANING :  54%|█████▍    | 998/1844 [08:33<06:33,  2.15it/s][A[A

TRANING :  54%|█████▍    | 999/1844 [08:33<07:19,  1.92it/s][A[A

TRANING :  54%|█████▍    | 1000/1844 [08:34<07:48,  1.80it/s][A[A

TRANING :  54%|█████▍    | 1001/1844 [08:35<09:10,  1.53it/s][A[A

TRANING :  54%|█████▍    | 1002/1844 [08:35<08:23,  1.67it/s][A[A

TRANING :  54%|█████▍    | 1003/1844 [08:36<08:24,  1.67it/s][A[A

TRANING :  54%|█████▍    | 1004/1844 [08:36<07:35,  1.85it/s][A[A

TRANING :  55%|█████▍    | 1005/1844 [08:37<06:39,  2.10it/s][A[A

TRANING :  55%|█████▍    | 1006/1844 [08:37<06:44,  2.07it/s][A[A

TRANING :  55%|█████▍    | 1007/1844 [08:38<07:06,  1.96it/s][A[A

TRANING :  55%|█████▍    | 1008/1844 [08:38<07:09,  1.95it/s][A[A

TRANING :  55%|█████▍    | 1009/1844 [08:39<09:01,  1.54it/s][A[A

TRANING :  55%|█████▍    | 1010/1844

epoch 2 step 2879 loss =0.25198 




TRANING :  56%|█████▌    | 1036/1844 [08:54<06:27,  2.09it/s][A[A

TRANING :  56%|█████▌    | 1037/1844 [08:54<06:46,  1.99it/s][A[A

TRANING :  56%|█████▋    | 1038/1844 [08:55<06:10,  2.18it/s][A[A

TRANING :  56%|█████▋    | 1039/1844 [08:56<07:56,  1.69it/s][A[A

TRANING :  56%|█████▋    | 1040/1844 [08:57<09:30,  1.41it/s][A[A

TRANING :  56%|█████▋    | 1041/1844 [08:57<07:55,  1.69it/s][A[A

TRANING :  57%|█████▋    | 1042/1844 [08:57<07:43,  1.73it/s][A[A

TRANING :  57%|█████▋    | 1043/1844 [08:58<07:16,  1.83it/s][A[A

TRANING :  57%|█████▋    | 1044/1844 [08:58<06:49,  1.95it/s][A[A

TRANING :  57%|█████▋    | 1045/1844 [08:59<06:09,  2.16it/s][A[A

TRANING :  57%|█████▋    | 1046/1844 [08:59<05:58,  2.23it/s][A[A

TRANING :  57%|█████▋    | 1047/1844 [08:59<05:50,  2.27it/s][A[A

TRANING :  57%|█████▋    | 1048/1844 [09:00<05:38,  2.35it/s][A[A

TRANING :  57%|█████▋    | 1049/1844 [09:00<05:08,  2.58it/s][A[A

TRANING :  57%|█████▋    | 1050/

epoch 2 step 2919 loss =0.13077 




TRANING :  58%|█████▊    | 1076/1844 [09:15<07:25,  1.72it/s][A[A

TRANING :  58%|█████▊    | 1077/1844 [09:16<06:48,  1.88it/s][A[A

TRANING :  58%|█████▊    | 1078/1844 [09:16<06:28,  1.97it/s][A[A

TRANING :  59%|█████▊    | 1079/1844 [09:17<07:15,  1.76it/s][A[A

TRANING :  59%|█████▊    | 1080/1844 [09:17<07:10,  1.77it/s][A[A

TRANING :  59%|█████▊    | 1081/1844 [09:18<06:51,  1.85it/s][A[A

TRANING :  59%|█████▊    | 1082/1844 [09:18<06:23,  1.99it/s][A[A

TRANING :  59%|█████▊    | 1083/1844 [09:19<06:11,  2.05it/s][A[A

TRANING :  59%|█████▉    | 1084/1844 [09:19<05:58,  2.12it/s][A[A

TRANING :  59%|█████▉    | 1085/1844 [09:20<06:42,  1.88it/s][A[A

TRANING :  59%|█████▉    | 1086/1844 [09:21<07:49,  1.61it/s][A[A

TRANING :  59%|█████▉    | 1087/1844 [09:21<07:20,  1.72it/s][A[A

TRANING :  59%|█████▉    | 1088/1844 [09:22<07:32,  1.67it/s][A[A

TRANING :  59%|█████▉    | 1089/1844 [09:22<07:19,  1.72it/s][A[A

TRANING :  59%|█████▉    | 1090/

epoch 2 step 2959 loss =0.2734 




TRANING :  61%|██████    | 1116/1844 [09:35<05:44,  2.11it/s][A[A

TRANING :  61%|██████    | 1117/1844 [09:36<06:04,  1.99it/s][A[A

TRANING :  61%|██████    | 1118/1844 [09:36<06:29,  1.86it/s][A[A

TRANING :  61%|██████    | 1119/1844 [09:37<06:05,  1.99it/s][A[A

TRANING :  61%|██████    | 1120/1844 [09:37<05:50,  2.07it/s][A[A

TRANING :  61%|██████    | 1121/1844 [09:38<05:27,  2.21it/s][A[A

TRANING :  61%|██████    | 1122/1844 [09:38<05:28,  2.19it/s][A[A

TRANING :  61%|██████    | 1123/1844 [09:38<05:35,  2.15it/s][A[A

TRANING :  61%|██████    | 1124/1844 [09:39<06:02,  1.99it/s][A[A

TRANING :  61%|██████    | 1125/1844 [09:39<05:30,  2.18it/s][A[A

TRANING :  61%|██████    | 1126/1844 [09:40<06:15,  1.91it/s][A[A

TRANING :  61%|██████    | 1127/1844 [09:41<07:26,  1.61it/s][A[A

TRANING :  61%|██████    | 1128/1844 [09:42<07:15,  1.65it/s][A[A

TRANING :  61%|██████    | 1129/1844 [09:42<08:37,  1.38it/s][A[A

TRANING :  61%|██████▏   | 1130/

epoch 2 step 2999 loss =0.24056 




TRANING :  63%|██████▎   | 1156/1844 [09:55<05:35,  2.05it/s][A[A

TRANING :  63%|██████▎   | 1157/1844 [09:56<05:23,  2.13it/s][A[A

TRANING :  63%|██████▎   | 1158/1844 [09:56<05:19,  2.15it/s][A[A

TRANING :  63%|██████▎   | 1159/1844 [09:56<04:58,  2.30it/s][A[A

TRANING :  63%|██████▎   | 1160/1844 [09:57<05:22,  2.12it/s][A[A

TRANING :  63%|██████▎   | 1161/1844 [09:58<05:57,  1.91it/s][A[A

TRANING :  63%|██████▎   | 1162/1844 [09:58<05:20,  2.13it/s][A[A

TRANING :  63%|██████▎   | 1163/1844 [09:59<05:34,  2.04it/s][A[A

TRANING :  63%|██████▎   | 1164/1844 [09:59<05:05,  2.23it/s][A[A

TRANING :  63%|██████▎   | 1165/1844 [09:59<04:45,  2.38it/s][A[A

TRANING :  63%|██████▎   | 1166/1844 [10:00<04:56,  2.29it/s][A[A

TRANING :  63%|██████▎   | 1167/1844 [10:00<05:30,  2.05it/s][A[A

TRANING :  63%|██████▎   | 1168/1844 [10:01<05:30,  2.04it/s][A[A

TRANING :  63%|██████▎   | 1169/1844 [10:01<05:22,  2.09it/s][A[A

TRANING :  63%|██████▎   | 1170/

epoch 2 step 3039 loss =0.29575 




TRANING :  65%|██████▍   | 1196/1844 [10:16<06:10,  1.75it/s][A[A

TRANING :  65%|██████▍   | 1197/1844 [10:17<05:42,  1.89it/s][A[A

TRANING :  65%|██████▍   | 1198/1844 [10:17<05:34,  1.93it/s][A[A

TRANING :  65%|██████▌   | 1199/1844 [10:18<05:06,  2.10it/s][A[A

TRANING :  65%|██████▌   | 1200/1844 [10:19<06:30,  1.65it/s][A[A

TRANING :  65%|██████▌   | 1201/1844 [10:19<06:13,  1.72it/s][A[A

TRANING :  65%|██████▌   | 1202/1844 [10:20<05:50,  1.83it/s][A[A

TRANING :  65%|██████▌   | 1203/1844 [10:20<05:19,  2.01it/s][A[A

TRANING :  65%|██████▌   | 1204/1844 [10:20<05:16,  2.02it/s][A[A

TRANING :  65%|██████▌   | 1205/1844 [10:21<04:45,  2.24it/s][A[A

TRANING :  65%|██████▌   | 1206/1844 [10:21<05:35,  1.90it/s][A[A

TRANING :  65%|██████▌   | 1207/1844 [10:22<05:34,  1.90it/s][A[A

TRANING :  66%|██████▌   | 1208/1844 [10:22<05:04,  2.09it/s][A[A

TRANING :  66%|██████▌   | 1209/1844 [10:23<04:40,  2.27it/s][A[A

TRANING :  66%|██████▌   | 1210/

epoch 2 step 3079 loss =0.79213 




TRANING :  67%|██████▋   | 1236/1844 [10:37<04:41,  2.16it/s][A[A

TRANING :  67%|██████▋   | 1237/1844 [10:37<04:50,  2.09it/s][A[A

TRANING :  67%|██████▋   | 1238/1844 [10:38<05:40,  1.78it/s][A[A

TRANING :  67%|██████▋   | 1239/1844 [10:38<05:23,  1.87it/s][A[A

TRANING :  67%|██████▋   | 1240/1844 [10:39<05:39,  1.78it/s][A[A

TRANING :  67%|██████▋   | 1241/1844 [10:39<05:14,  1.92it/s][A[A

TRANING :  67%|██████▋   | 1242/1844 [10:40<04:36,  2.18it/s][A[A

TRANING :  67%|██████▋   | 1243/1844 [10:40<04:34,  2.19it/s][A[A

TRANING :  67%|██████▋   | 1244/1844 [10:41<04:36,  2.17it/s][A[A

TRANING :  68%|██████▊   | 1245/1844 [10:41<04:49,  2.07it/s][A[A

TRANING :  68%|██████▊   | 1246/1844 [10:42<05:07,  1.94it/s][A[A

TRANING :  68%|██████▊   | 1247/1844 [10:42<05:13,  1.90it/s][A[A

TRANING :  68%|██████▊   | 1248/1844 [10:43<05:11,  1.91it/s][A[A

TRANING :  68%|██████▊   | 1249/1844 [10:43<05:32,  1.79it/s][A[A

TRANING :  68%|██████▊   | 1250/

epoch 2 step 3119 loss =0.29128 




TRANING :  69%|██████▉   | 1277/1844 [10:58<04:50,  1.95it/s][A[A

TRANING :  69%|██████▉   | 1278/1844 [10:58<04:27,  2.11it/s][A[A

TRANING :  69%|██████▉   | 1279/1844 [10:59<04:23,  2.14it/s][A[A

TRANING :  69%|██████▉   | 1280/1844 [10:59<04:03,  2.32it/s][A[A

TRANING :  69%|██████▉   | 1281/1844 [11:00<04:48,  1.95it/s][A[A

TRANING :  70%|██████▉   | 1282/1844 [11:00<04:31,  2.07it/s][A[A

TRANING :  70%|██████▉   | 1283/1844 [11:01<04:25,  2.11it/s][A[A

TRANING :  70%|██████▉   | 1284/1844 [11:01<04:32,  2.05it/s][A[A

TRANING :  70%|██████▉   | 1285/1844 [11:02<04:30,  2.06it/s][A[A

TRANING :  70%|██████▉   | 1286/1844 [11:02<04:33,  2.04it/s][A[A

TRANING :  70%|██████▉   | 1287/1844 [11:03<05:16,  1.76it/s][A[A

TRANING :  70%|██████▉   | 1288/1844 [11:04<05:06,  1.81it/s][A[A

TRANING :  70%|██████▉   | 1289/1844 [11:04<05:22,  1.72it/s][A[A

TRANING :  70%|██████▉   | 1290/1844 [11:05<05:23,  1.71it/s][A[A

TRANING :  70%|███████   | 1291/

epoch 2 step 3159 loss =0.10879 




TRANING :  71%|███████▏  | 1316/1844 [11:19<05:07,  1.72it/s][A[A

TRANING :  71%|███████▏  | 1317/1844 [11:19<04:47,  1.83it/s][A[A

TRANING :  71%|███████▏  | 1318/1844 [11:20<04:45,  1.84it/s][A[A

TRANING :  72%|███████▏  | 1319/1844 [11:20<04:38,  1.88it/s][A[A

TRANING :  72%|███████▏  | 1320/1844 [11:21<04:24,  1.98it/s][A[A

TRANING :  72%|███████▏  | 1321/1844 [11:21<04:11,  2.08it/s][A[A

TRANING :  72%|███████▏  | 1322/1844 [11:22<04:41,  1.85it/s][A[A

TRANING :  72%|███████▏  | 1323/1844 [11:23<05:29,  1.58it/s][A[A

TRANING :  72%|███████▏  | 1324/1844 [11:23<05:12,  1.66it/s][A[A

TRANING :  72%|███████▏  | 1325/1844 [11:24<04:56,  1.75it/s][A[A

TRANING :  72%|███████▏  | 1326/1844 [11:24<04:33,  1.89it/s][A[A

TRANING :  72%|███████▏  | 1327/1844 [11:24<04:22,  1.97it/s][A[A

TRANING :  72%|███████▏  | 1328/1844 [11:25<04:29,  1.91it/s][A[A

TRANING :  72%|███████▏  | 1329/1844 [11:26<04:33,  1.88it/s][A[A

TRANING :  72%|███████▏  | 1330/

epoch 2 step 3199 loss =0.39553 




TRANING :  74%|███████▎  | 1356/1844 [11:39<04:11,  1.94it/s][A[A

TRANING :  74%|███████▎  | 1357/1844 [11:39<03:37,  2.24it/s][A[A

TRANING :  74%|███████▎  | 1358/1844 [11:40<03:45,  2.15it/s][A[A

TRANING :  74%|███████▎  | 1359/1844 [11:41<04:25,  1.82it/s][A[A

TRANING :  74%|███████▍  | 1360/1844 [11:41<04:26,  1.82it/s][A[A

TRANING :  74%|███████▍  | 1361/1844 [11:42<04:18,  1.87it/s][A[A

TRANING :  74%|███████▍  | 1362/1844 [11:42<04:00,  2.00it/s][A[A

TRANING :  74%|███████▍  | 1363/1844 [11:43<04:04,  1.97it/s][A[A

TRANING :  74%|███████▍  | 1364/1844 [11:43<04:07,  1.94it/s][A[A

TRANING :  74%|███████▍  | 1365/1844 [11:44<04:27,  1.79it/s][A[A

TRANING :  74%|███████▍  | 1366/1844 [11:44<04:00,  1.99it/s][A[A

TRANING :  74%|███████▍  | 1367/1844 [11:45<03:51,  2.06it/s][A[A

TRANING :  74%|███████▍  | 1368/1844 [11:45<04:01,  1.97it/s][A[A

TRANING :  74%|███████▍  | 1369/1844 [11:46<04:00,  1.97it/s][A[A

TRANING :  74%|███████▍  | 1370/

epoch 2 step 3239 loss =0.13973 




TRANING :  76%|███████▌  | 1397/1844 [12:01<03:22,  2.21it/s][A[A

TRANING :  76%|███████▌  | 1398/1844 [12:01<03:06,  2.39it/s][A[A

TRANING :  76%|███████▌  | 1399/1844 [12:02<03:21,  2.20it/s][A[A

TRANING :  76%|███████▌  | 1400/1844 [12:02<03:50,  1.93it/s][A[A

TRANING :  76%|███████▌  | 1401/1844 [12:03<03:51,  1.91it/s][A[A

TRANING :  76%|███████▌  | 1402/1844 [12:03<03:32,  2.08it/s][A[A

TRANING :  76%|███████▌  | 1403/1844 [12:04<03:43,  1.98it/s][A[A

TRANING :  76%|███████▌  | 1404/1844 [12:04<03:35,  2.04it/s][A[A

TRANING :  76%|███████▌  | 1405/1844 [12:05<03:45,  1.95it/s][A[A

TRANING :  76%|███████▌  | 1406/1844 [12:05<03:19,  2.19it/s][A[A

TRANING :  76%|███████▋  | 1407/1844 [12:06<03:04,  2.37it/s][A[A

TRANING :  76%|███████▋  | 1408/1844 [12:06<03:12,  2.26it/s][A[A

TRANING :  76%|███████▋  | 1409/1844 [12:07<03:15,  2.23it/s][A[A

TRANING :  76%|███████▋  | 1410/1844 [12:07<03:12,  2.26it/s][A[A

TRANING :  77%|███████▋  | 1411/

epoch 2 step 3279 loss =0.46953 




TRANING :  78%|███████▊  | 1436/1844 [12:20<03:24,  2.00it/s][A[A

TRANING :  78%|███████▊  | 1437/1844 [12:21<03:37,  1.87it/s][A[A

TRANING :  78%|███████▊  | 1438/1844 [12:21<03:24,  1.99it/s][A[A

TRANING :  78%|███████▊  | 1439/1844 [12:22<03:32,  1.90it/s][A[A

TRANING :  78%|███████▊  | 1440/1844 [12:22<03:25,  1.97it/s][A[A

TRANING :  78%|███████▊  | 1441/1844 [12:22<03:06,  2.16it/s][A[A

TRANING :  78%|███████▊  | 1442/1844 [12:23<03:10,  2.11it/s][A[A

TRANING :  78%|███████▊  | 1443/1844 [12:24<03:18,  2.02it/s][A[A

TRANING :  78%|███████▊  | 1444/1844 [12:24<03:13,  2.07it/s][A[A

TRANING :  78%|███████▊  | 1445/1844 [12:25<03:31,  1.89it/s][A[A

TRANING :  78%|███████▊  | 1446/1844 [12:26<04:19,  1.53it/s][A[A

TRANING :  78%|███████▊  | 1447/1844 [12:26<04:06,  1.61it/s][A[A

TRANING :  79%|███████▊  | 1448/1844 [12:27<03:46,  1.75it/s][A[A

TRANING :  79%|███████▊  | 1449/1844 [12:27<03:28,  1.89it/s][A[A

TRANING :  79%|███████▊  | 1450/

epoch 2 step 3319 loss =0.18984 




TRANING :  80%|████████  | 1476/1844 [12:42<02:54,  2.10it/s][A[A

TRANING :  80%|████████  | 1477/1844 [12:42<02:43,  2.25it/s][A[A

TRANING :  80%|████████  | 1478/1844 [12:43<02:28,  2.46it/s][A[A

TRANING :  80%|████████  | 1479/1844 [12:43<02:47,  2.18it/s][A[A

TRANING :  80%|████████  | 1480/1844 [12:44<02:53,  2.10it/s][A[A

TRANING :  80%|████████  | 1481/1844 [12:44<02:38,  2.29it/s][A[A

TRANING :  80%|████████  | 1482/1844 [12:44<02:36,  2.32it/s][A[A

TRANING :  80%|████████  | 1483/1844 [12:45<02:30,  2.39it/s][A[A

TRANING :  80%|████████  | 1484/1844 [12:46<03:26,  1.74it/s][A[A

TRANING :  81%|████████  | 1485/1844 [12:46<02:59,  2.00it/s][A[A

TRANING :  81%|████████  | 1486/1844 [12:47<03:02,  1.96it/s][A[A

TRANING :  81%|████████  | 1487/1844 [12:47<02:57,  2.01it/s][A[A

TRANING :  81%|████████  | 1488/1844 [12:48<02:56,  2.02it/s][A[A

TRANING :  81%|████████  | 1489/1844 [12:48<02:47,  2.12it/s][A[A

TRANING :  81%|████████  | 1490/

epoch 2 step 3359 loss =0.57502 




TRANING :  82%|████████▏ | 1516/1844 [13:02<02:51,  1.92it/s][A[A

TRANING :  82%|████████▏ | 1517/1844 [13:03<02:55,  1.86it/s][A[A

TRANING :  82%|████████▏ | 1518/1844 [13:04<03:40,  1.48it/s][A[A

TRANING :  82%|████████▏ | 1519/1844 [13:04<03:23,  1.60it/s][A[A

TRANING :  82%|████████▏ | 1520/1844 [13:05<03:07,  1.73it/s][A[A

TRANING :  82%|████████▏ | 1521/1844 [13:05<02:51,  1.88it/s][A[A

TRANING :  83%|████████▎ | 1522/1844 [13:06<02:30,  2.14it/s][A[A

TRANING :  83%|████████▎ | 1523/1844 [13:06<02:37,  2.03it/s][A[A

TRANING :  83%|████████▎ | 1524/1844 [13:07<02:33,  2.09it/s][A[A

TRANING :  83%|████████▎ | 1525/1844 [13:07<02:28,  2.15it/s][A[A

TRANING :  83%|████████▎ | 1526/1844 [13:08<02:28,  2.15it/s][A[A

TRANING :  83%|████████▎ | 1527/1844 [13:09<03:39,  1.44it/s][A[A

TRANING :  83%|████████▎ | 1528/1844 [13:09<03:25,  1.54it/s][A[A

TRANING :  83%|████████▎ | 1529/1844 [13:10<03:10,  1.66it/s][A[A

TRANING :  83%|████████▎ | 1530/

epoch 2 step 3399 loss =0.67837 




TRANING :  84%|████████▍ | 1556/1844 [13:23<02:24,  1.99it/s][A[A

TRANING :  84%|████████▍ | 1557/1844 [13:23<02:20,  2.05it/s][A[A

TRANING :  84%|████████▍ | 1558/1844 [13:24<02:21,  2.02it/s][A[A

TRANING :  85%|████████▍ | 1559/1844 [13:24<02:26,  1.95it/s][A[A

TRANING :  85%|████████▍ | 1560/1844 [13:25<02:53,  1.63it/s][A[A

TRANING :  85%|████████▍ | 1561/1844 [13:26<02:34,  1.83it/s][A[A

TRANING :  85%|████████▍ | 1562/1844 [13:26<02:55,  1.60it/s][A[A

TRANING :  85%|████████▍ | 1563/1844 [13:27<02:40,  1.75it/s][A[A

TRANING :  85%|████████▍ | 1564/1844 [13:27<02:38,  1.77it/s][A[A

TRANING :  85%|████████▍ | 1565/1844 [13:28<02:26,  1.91it/s][A[A

TRANING :  85%|████████▍ | 1566/1844 [13:28<02:13,  2.09it/s][A[A

TRANING :  85%|████████▍ | 1567/1844 [13:29<02:31,  1.83it/s][A[A

TRANING :  85%|████████▌ | 1568/1844 [13:29<02:22,  1.93it/s][A[A

TRANING :  85%|████████▌ | 1569/1844 [13:30<02:38,  1.74it/s][A[A

TRANING :  85%|████████▌ | 1570/

epoch 2 step 3439 loss =0.32232 




TRANING :  87%|████████▋ | 1596/1844 [13:43<01:51,  2.22it/s][A[A

TRANING :  87%|████████▋ | 1597/1844 [13:43<01:52,  2.19it/s][A[A

TRANING :  87%|████████▋ | 1598/1844 [13:44<01:44,  2.35it/s][A[A

TRANING :  87%|████████▋ | 1599/1844 [13:44<01:54,  2.14it/s][A[A

TRANING :  87%|████████▋ | 1600/1844 [13:45<01:46,  2.30it/s][A[A

TRANING :  87%|████████▋ | 1601/1844 [13:45<01:46,  2.27it/s][A[A

TRANING :  87%|████████▋ | 1602/1844 [13:46<01:42,  2.36it/s][A[A

TRANING :  87%|████████▋ | 1603/1844 [13:46<01:41,  2.38it/s][A[A

TRANING :  87%|████████▋ | 1604/1844 [13:46<01:39,  2.42it/s][A[A

TRANING :  87%|████████▋ | 1605/1844 [13:47<01:41,  2.35it/s][A[A

TRANING :  87%|████████▋ | 1606/1844 [13:47<01:38,  2.42it/s][A[A

TRANING :  87%|████████▋ | 1607/1844 [13:48<01:37,  2.43it/s][A[A

TRANING :  87%|████████▋ | 1608/1844 [13:48<01:49,  2.16it/s][A[A

TRANING :  87%|████████▋ | 1609/1844 [13:49<01:57,  1.99it/s][A[A

TRANING :  87%|████████▋ | 1610/

epoch 2 step 3479 loss =0.28278 




TRANING :  89%|████████▊ | 1636/1844 [14:04<01:51,  1.86it/s][A[A

TRANING :  89%|████████▉ | 1637/1844 [14:04<01:37,  2.12it/s][A[A

TRANING :  89%|████████▉ | 1638/1844 [14:04<01:28,  2.33it/s][A[A

TRANING :  89%|████████▉ | 1639/1844 [14:05<01:29,  2.28it/s][A[A

TRANING :  89%|████████▉ | 1640/1844 [14:05<01:44,  1.95it/s][A[A

TRANING :  89%|████████▉ | 1641/1844 [14:06<01:30,  2.24it/s][A[A

TRANING :  89%|████████▉ | 1642/1844 [14:06<01:33,  2.17it/s][A[A

TRANING :  89%|████████▉ | 1643/1844 [14:07<01:34,  2.13it/s][A[A

TRANING :  89%|████████▉ | 1644/1844 [14:07<01:52,  1.78it/s][A[A

TRANING :  89%|████████▉ | 1645/1844 [14:08<01:57,  1.70it/s][A[A

TRANING :  89%|████████▉ | 1646/1844 [14:09<01:55,  1.71it/s][A[A

TRANING :  89%|████████▉ | 1647/1844 [14:09<01:40,  1.97it/s][A[A

TRANING :  89%|████████▉ | 1648/1844 [14:09<01:28,  2.21it/s][A[A

TRANING :  89%|████████▉ | 1649/1844 [14:10<01:31,  2.14it/s][A[A

TRANING :  89%|████████▉ | 1650/

epoch 2 step 3519 loss =0.21379 




TRANING :  91%|█████████ | 1676/1844 [14:23<01:25,  1.97it/s][A[A

TRANING :  91%|█████████ | 1677/1844 [14:24<01:40,  1.66it/s][A[A

TRANING :  91%|█████████ | 1678/1844 [14:25<01:59,  1.39it/s][A[A

TRANING :  91%|█████████ | 1679/1844 [14:25<01:43,  1.59it/s][A[A

TRANING :  91%|█████████ | 1680/1844 [14:25<01:30,  1.82it/s][A[A

TRANING :  91%|█████████ | 1681/1844 [14:26<01:23,  1.95it/s][A[A

TRANING :  91%|█████████ | 1682/1844 [14:27<01:30,  1.78it/s][A[A

TRANING :  91%|█████████▏| 1683/1844 [14:27<01:26,  1.86it/s][A[A

TRANING :  91%|█████████▏| 1684/1844 [14:27<01:21,  1.97it/s][A[A

TRANING :  91%|█████████▏| 1685/1844 [14:28<01:11,  2.21it/s][A[A

TRANING :  91%|█████████▏| 1686/1844 [14:28<01:16,  2.08it/s][A[A

TRANING :  91%|█████████▏| 1687/1844 [14:29<01:09,  2.25it/s][A[A

TRANING :  92%|█████████▏| 1688/1844 [14:29<01:07,  2.33it/s][A[A

TRANING :  92%|█████████▏| 1689/1844 [14:29<01:02,  2.49it/s][A[A

TRANING :  92%|█████████▏| 1690/

epoch 2 step 3559 loss =0.49574 




TRANING :  93%|█████████▎| 1716/1844 [14:45<01:23,  1.53it/s][A[A

TRANING :  93%|█████████▎| 1717/1844 [14:45<01:18,  1.62it/s][A[A

TRANING :  93%|█████████▎| 1718/1844 [14:46<01:10,  1.79it/s][A[A

TRANING :  93%|█████████▎| 1719/1844 [14:46<01:02,  2.01it/s][A[A

TRANING :  93%|█████████▎| 1720/1844 [14:47<01:09,  1.78it/s][A[A

TRANING :  93%|█████████▎| 1721/1844 [14:47<01:04,  1.92it/s][A[A

TRANING :  93%|█████████▎| 1722/1844 [14:48<00:58,  2.10it/s][A[A

TRANING :  93%|█████████▎| 1723/1844 [14:48<00:55,  2.20it/s][A[A

TRANING :  93%|█████████▎| 1724/1844 [14:49<01:05,  1.84it/s][A[A

TRANING :  94%|█████████▎| 1725/1844 [14:49<00:57,  2.08it/s][A[A

TRANING :  94%|█████████▎| 1726/1844 [14:50<00:51,  2.29it/s][A[A

TRANING :  94%|█████████▎| 1727/1844 [14:50<00:50,  2.33it/s][A[A

TRANING :  94%|█████████▎| 1728/1844 [14:50<00:50,  2.32it/s][A[A

TRANING :  94%|█████████▍| 1729/1844 [14:51<00:49,  2.34it/s][A[A

TRANING :  94%|█████████▍| 1730/

epoch 2 step 3599 loss =0.56078 




TRANING :  95%|█████████▌| 1756/1844 [15:05<00:40,  2.15it/s][A[A

TRANING :  95%|█████████▌| 1757/1844 [15:05<00:40,  2.17it/s][A[A

TRANING :  95%|█████████▌| 1758/1844 [15:06<00:42,  2.04it/s][A[A

TRANING :  95%|█████████▌| 1759/1844 [15:06<00:38,  2.22it/s][A[A

TRANING :  95%|█████████▌| 1760/1844 [15:07<00:39,  2.11it/s][A[A

TRANING :  95%|█████████▌| 1761/1844 [15:07<00:40,  2.07it/s][A[A

TRANING :  96%|█████████▌| 1762/1844 [15:08<00:36,  2.24it/s][A[A

TRANING :  96%|█████████▌| 1763/1844 [15:08<00:37,  2.19it/s][A[A

TRANING :  96%|█████████▌| 1764/1844 [15:09<00:41,  1.93it/s][A[A

TRANING :  96%|█████████▌| 1765/1844 [15:09<00:47,  1.65it/s][A[A

TRANING :  96%|█████████▌| 1766/1844 [15:10<00:49,  1.56it/s][A[A

TRANING :  96%|█████████▌| 1767/1844 [15:11<00:42,  1.81it/s][A[A

TRANING :  96%|█████████▌| 1768/1844 [15:11<00:39,  1.93it/s][A[A

TRANING :  96%|█████████▌| 1769/1844 [15:11<00:36,  2.06it/s][A[A

TRANING :  96%|█████████▌| 1770/

epoch 2 step 3639 loss =0.07626 




TRANING :  97%|█████████▋| 1796/1844 [15:25<00:22,  2.12it/s][A[A

TRANING :  97%|█████████▋| 1797/1844 [15:26<00:20,  2.31it/s][A[A

TRANING :  98%|█████████▊| 1798/1844 [15:26<00:21,  2.10it/s][A[A

TRANING :  98%|█████████▊| 1799/1844 [15:27<00:22,  2.00it/s][A[A

TRANING :  98%|█████████▊| 1800/1844 [15:27<00:20,  2.19it/s][A[A

TRANING :  98%|█████████▊| 1801/1844 [15:28<00:24,  1.77it/s][A[A

TRANING :  98%|█████████▊| 1802/1844 [15:29<00:24,  1.70it/s][A[A

TRANING :  98%|█████████▊| 1803/1844 [15:29<00:24,  1.68it/s][A[A

TRANING :  98%|█████████▊| 1804/1844 [15:30<00:24,  1.62it/s][A[A

TRANING :  98%|█████████▊| 1805/1844 [15:30<00:20,  1.86it/s][A[A

TRANING :  98%|█████████▊| 1806/1844 [15:31<00:20,  1.85it/s][A[A

TRANING :  98%|█████████▊| 1807/1844 [15:31<00:18,  2.06it/s][A[A

TRANING :  98%|█████████▊| 1808/1844 [15:32<00:17,  2.06it/s][A[A

TRANING :  98%|█████████▊| 1809/1844 [15:32<00:15,  2.24it/s][A[A

TRANING :  98%|█████████▊| 1810/

epoch 2 step 3679 loss =0.82373 




TRANING : 100%|█████████▉| 1836/1844 [15:45<00:04,  1.76it/s][A[A

TRANING : 100%|█████████▉| 1837/1844 [15:47<00:05,  1.35it/s][A[A

TRANING : 100%|█████████▉| 1838/1844 [15:47<00:03,  1.52it/s][A[A

TRANING : 100%|█████████▉| 1839/1844 [15:47<00:02,  1.74it/s][A[A

TRANING : 100%|█████████▉| 1840/1844 [15:48<00:02,  1.82it/s][A[A

TRANING : 100%|█████████▉| 1841/1844 [15:49<00:01,  1.70it/s][A[A

TRANING : 100%|█████████▉| 1842/1844 [15:49<00:01,  1.89it/s][A[A

TRANING : 100%|█████████▉| 1843/1844 [15:50<00:00,  1.85it/s][A[A

TRANING : 100%|██████████| 1844/1844 [15:50<00:00,  2.17it/s][A[A

[A[A

In [0]:
bert_model.eval()
num = 0 
dev_acc = 0
for batch in tqdm(dev_dataloader, desc='DEV'):
    in_ind, length, label_ind = batch
    in_ind, label_ind = in_ind.to(device), label_ind.to(device)
    
    logits = bert_model(in_ind.to(device))   
    logits = logits[0].detach().cpu().numpy()
    label_ind = label_ind.to('cpu').numpy().flatten()


    batch_acc = np.sum(np.argmax(logits, axis=1).flatten() == label_ind) / len(label_ind)
    dev_acc += batch_acc
    num += 1
print('Dev Acc =  {}'.format( round(dev_acc / num) ) )





DEV:   0%|          | 0/1809 [00:00<?, ?it/s][A[A[A[A



DEV:   0%|          | 1/1809 [00:00<04:55,  6.11it/s][A[A[A[A



DEV:   0%|          | 2/1809 [00:00<04:52,  6.17it/s][A[A[A[A



DEV:   0%|          | 3/1809 [00:00<05:27,  5.52it/s][A[A[A[A



DEV:   0%|          | 4/1809 [00:00<05:40,  5.31it/s][A[A[A[A



DEV:   0%|          | 5/1809 [00:00<05:39,  5.32it/s][A[A[A[A



DEV:   0%|          | 6/1809 [00:01<05:41,  5.28it/s][A[A[A[A



DEV:   0%|          | 7/1809 [00:01<06:19,  4.75it/s][A[A[A[A



DEV:   0%|          | 8/1809 [00:01<05:52,  5.11it/s][A[A[A[A



DEV:   0%|          | 9/1809 [00:01<05:54,  5.07it/s][A[A[A[A



DEV:   1%|          | 10/1809 [00:01<05:31,  5.43it/s][A[A[A[A



DEV:   1%|          | 11/1809 [00:02<05:22,  5.58it/s][A[A[A[A



DEV:   1%|          | 12/1809 [00:02<05:06,  5.87it/s][A[A[A[A



DEV:   1%|          | 13/1809 [00:02<04:56,  6.06it/s][A[A[A[A



DEV:   1%|          | 14/1809 [00:

Dev Acc =  1.0


In [0]:
print('Dev Acc =  {}'.format( round(dev_acc / 1809 , 5) ) ) 

Dev Acc =  0.86567


In [157]:
bert_model.eval()
num = 0 
test_acc = 0
for batch in tqdm(test_dataloader, desc='TEST'):
    in_ind, length, label_ind = batch
    in_ind, label_ind = in_ind.to(device), label_ind.to(device)

    logits = bert_model(in_ind.to(device))   
    logits = logits[0].detach().cpu().numpy()
    label_ind = label_ind.to('cpu').numpy().flatten()


    batch_acc = np.sum(np.argmax(logits, axis=1).flatten() == label_ind) / len(label_ind)
    test_acc += batch_acc
    num += 1






TEST:   0%|          | 0/10664 [00:00<?, ?it/s][A[A[A[A



TEST:   0%|          | 1/10664 [00:00<21:27,  8.28it/s][A[A[A[A



TEST:   0%|          | 2/10664 [00:00<23:59,  7.41it/s][A[A[A[A



TEST:   0%|          | 3/10664 [00:00<24:09,  7.35it/s][A[A[A[A



TEST:   0%|          | 4/10664 [00:00<26:12,  6.78it/s][A[A[A[A



TEST:   0%|          | 5/10664 [00:00<25:25,  6.99it/s][A[A[A[A



TEST:   0%|          | 6/10664 [00:00<25:49,  6.88it/s][A[A[A[A



TEST:   0%|          | 7/10664 [00:01<26:13,  6.77it/s][A[A[A[A



TEST:   0%|          | 8/10664 [00:01<25:33,  6.95it/s][A[A[A[A



TEST:   0%|          | 9/10664 [00:01<25:01,  7.10it/s][A[A[A[A



TEST:   0%|          | 10/10664 [00:01<31:26,  5.65it/s][A[A[A[A



TEST:   0%|          | 11/10664 [00:01<32:57,  5.39it/s][A[A[A[A



TEST:   0%|          | 12/10664 [00:02<36:45,  4.83it/s][A[A[A[A



TEST:   0%|          | 13/10664 [00:02<34:22,  5.16it/s][A[A[A[A



TEST: 

In [5]:
print('Test Acc =  {}'.format( round(test_acc / num , 5) ) )

Test Acc =  0.86178


## Бонусная часть (до 3 баллов)

Улучшите качество (на обеих выборках), используя любые способы (кроме использования дополнительных обучающих данных датасета RCT2000):

* $> 0.86$ — 1 балл 
* $> 0.88$ — 2 балла
* $> 0.9$ — 3 балла