# Hugging Face Transformers 微调训练入门

本示例将介绍基于 Transformers 实现模型微调训练的主要流程，包括：
- 数据集下载
- 数据预处理
- 训练超参数配置
- 训练评估指标设置
- 训练器基本介绍
- 实战训练
- 模型保存

## YelpReviewFull 数据集

**Hugging Face 数据集：[ YelpReviewFull ](https://huggingface.co/datasets/yelp_review_full)**

### 数据集摘要

Yelp评论数据集包括来自Yelp的评论。它是从Yelp Dataset Challenge 2015数据中提取的。

### 支持的任务和排行榜
文本分类、情感分类：该数据集主要用于文本分类：给定文本，预测情感。

### 语言
这些评论主要以英语编写。

### 数据集结构

#### 数据实例
一个典型的数据点包括文本和相应的标签。

来自YelpReviewFull测试集的示例如下：

```json
{
    'label': 0,
    'text': 'I got \'new\' tires from them and within two weeks got a flat. I took my car to a local mechanic to see if i could get the hole patched, but they said the reason I had a flat was because the previous patch had blown - WAIT, WHAT? I just got the tire and never needed to have it patched? This was supposed to be a new tire. \\nI took the tire over to Flynn\'s and they told me that someone punctured my tire, then tried to patch it. So there are resentful tire slashers? I find that very unlikely. After arguing with the guy and telling him that his logic was far fetched he said he\'d give me a new tire \\"this time\\". \\nI will never go back to Flynn\'s b/c of the way this guy treated me and the simple fact that they gave me a used tire!'
}
```

#### 数据字段

- 'text': 评论文本使用双引号（"）转义，任何内部双引号都通过2个双引号（""）转义。换行符使用反斜杠后跟一个 "n" 字符转义，即 "\n"。
- 'label': 对应于评论的分数（介于1和5之间）。

#### 数据拆分

Yelp评论完整星级数据集是通过随机选取每个1到5星评论的130,000个训练样本和10,000个测试样本构建的。总共有650,000个训练样本和50,000个测试样本。

## 下载数据集

In [1]:
from datasets import load_dataset

dataset = load_dataset("yelp_review_full")

  from .autonotebook import tqdm as notebook_tqdm
Using the latest cached version of the module from /Users/apple/.cache/huggingface/modules/datasets_modules/datasets/yelp_review_full/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf (last modified on Wed Dec 20 21:44:14 2023) since it couldn't be found locally at yelp_review_full., or remotely on the Hugging Face Hub.


In [2]:
dataset

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 650000
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 50000
    })
})

In [4]:
dataset["train"][112]

{'label': 1,
 'text': "I'm not a huge fan of this location. I think that it was oddly built- the small, alley-like front makes it difficult to get past people on your way to/from the bathroom or tables when it's busy. And there's hardly any tables to sit at. Furthermore, people tend to clog the front on their way in, which makes things particularly difficult (especially in the winter weather). The staff were pretty impersonal to be, but maybe that was due to the high traffic of the place and the time I was there. And the coffee that I had was cold- I'm sure it was probably the bottom of the batch. I'd probably only walk in here again if someone else suggested it before or after a movie or while we were shopping in the area."}

In [5]:
import random
import pandas as pd
import datasets
from IPython.display import display, HTML

In [6]:
def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [7]:
show_random_elements(dataset["train"])

Unnamed: 0,label,text
0,2 star,"Despite the decent atmosphere and not too terrible location we get sat at one of the greasiest tables I've seen in a long time. We order appetizers and they come posthaste. Good, but not worth 10$ a plate good... Then comes the pink prickly pear chicken :/ poor execution at a great recipe. The goat cheese bruschetta was OK but slightly disappointing. If you want a 50$ tab for a few drink+ appetizers and 1 entree i suggest you go somewhere else. Oh and don't even think about giving them a bad yelp review, the owner will put a plaque on the wall for anyone that takes the time to write about how disappointing the food is. Guy can't take a hint."
1,3 stars,"We were looking for an outdoor dining on a Sunday evening and found Nori Sushi at the Market Place. When we arrived we were promptly seated with the menu; however, after waiting for a server for a while I had to ask a busboy to send someone to us. The place was pretty empty at the time,\n\nStarted off with fried calamari, which was tasty but they were very tiny. Good miso soup. Ahi poke was fresh and had good flavor. Spicy salmon was very good. Hamachi sashimi was okay. Good service."
2,5 stars,I am SO in love with the banana split sandwich!! Great food. Great service. Indoor and outdoor seating.
3,1 star,"No stars! Horrible I just wanted tap water after my husband bought a $4 coffee. Crystal refused and laughed and said come back with your own cup so I did and they all laughed l. I asked for a manager and she gave me a card that said \""princess rippy\"" (702) 497-9598 horrible horrible"
4,1 star,"The name says it all, culinary dropout, food and service both worthy of expulsion. \n\nI have never been so disappointed or disrespected in a restaurant as I was at culinary dropout. I placed an order for pickup and when I arrived, the food was sitting by the cash register at the bar in the open. It wasn't under a heat lamp or in anything to keep it warm, I even made a comment to the rude bartender who wanted to finish drying his wine glasses before approaching me asking why the food wasn't being kept warm and that it would be cold before getting home and his response was that it was warm. Then he printed out the check and all he said was \""cheers!\"" \n\nI ordered the steak, fish and chips and soft pretzel, all of which were cold and soggy, the steak was already cut up and half of it was fat. Not just marbled with fat, pure yellow fat! The fries were soggy, fondue dipping sauce was solidified, fish and chips were cold and limp. \n\nGiving the restaurant one star is generous, they can bet their bottom dollar I won't be returning,"
5,4 stars,"Wow, one of the best filets in Phoenix. I was skeptical but VERY impressed. You can pay extra for some steak toppings such as lump crab meat or shrimp, but be forewarned those sides are miniscule. Not worth the price - stick with just the filet and the sides it comes with. Plenty of food. Be sure to order to pretzel bread with cheese as an appetizer!"
6,5 stars,"I love coming here. The guys let me mix the flavors & actually add shots of vodka for no additional cost!! These guys are nice too. Its worth the prices. When visiting Vegas, come to this\""Evening Call\"" location. My favorite..."
7,1 star,"DO NOT STAY IN THIS HOTEL IF YOU ARE OVER 25 YEARS OF AGE!\n\nI stayed at the Planet Hollywood Hotel on April 18-21. It was the most miserable hotel experience that I have ever had. \n\nFirstly, when I took the bus from the airport, the busses are instructed to drop guests off at a certain location. It was a long walk from that point to the hotel reception (on an underground street), and there was no one to assist with bags. I was traveling alone with a suitcase, a carry on, a computer bag and a infant car seat (meeting my family later), and I had to manage all this by myself. The hotel itself has double doors at the entrance, and they are neither revolving nor automated. And there is no doorman to open the door for you.\n\nI had to stand in line for 45 minutes before I could check in to my room. I got a room with a view of the swimming pool (the alternative view is of back walls of the hotel). The view was very nice as I could see parts of the Vegas strip. The noise coming from the pool was so loud (loud music) that an afternoon nap was out of the question (and I was on the 18th floor!). You can not escape the blaring music. It's in the lobby while you're waiting to check in. It's even in the elevators. And while I think of it, if you don't want your kids to see the scantily clad women on the elevator doors, either blindfold them, or don't bring them to this hotel.\n\nAs a parent of young children, I try to get to bed at a reasonably early time. On my first night, at 2am, I was woken by loud noises coming from the adjoining room. Eventually I banged on their door, and told them to keep it down. The next morning, I called the front desk and told them about the incident. I told them that I wanted to move to another room, but all they could offer me was a room with no view, so I told them that I would stay where I was. I asked them to talk to the guests next door, and ask them to keep it down, but they were not willing to do that.\n\nOn my 3rd night, I was again woken by loud noises coming from next door at 1:35am. I called my neighbors on the hotel phone, and told them that they were creating a disturbance and needed to keep it down. They told me that \""this is Vegas, and we can do what we want\"". I called hotel security, and asked them to deal with it. Half an hour later, I was again woken by loud noises, and so again I called security who told me that they were busy moving the guests to another room. At 3am, I was woken by the phone in my room. There was no-one on phone. 15 minutes later same thing. I called the hotel manager, and told him about it. He said that he would put a block on my phone. At 4am, I was woken by loud banging on my door. No one there. Again, I called the manager, and asked him to review the security tapes (he told me that there were cameras on the floors). 15 minutes later, some hotel employee knocked on my door to say that there were in fact no cameras on the floors, and asked if there was a problem. At 5am, I was again woken by banging on my door, and there was no one there. I never did get back to sleep, and got about 90 minutes sleep total that night. \n\nI had very strong words with the manager in the morning, and told him that I felt that the hotel had not done enough to halt this nightmare (these guests should have been removed from the premises or the hotel should have placed someone outside my door). I told him that the hotel needed to comp me a room for a future stay (not that I would actually want to stay at this hotel again). He told me that my inconvenience didn't warrant further compensation, and offered me a free breakfast..\n\nThe pools at this hotel are average.\n\nMy recommendation if you're visiting Vegas and are not in your teens, pay the extra money and stay at the Bellagio."
8,3 stars,"Papas is a nice Portuguese restaurant with friendly staff and cozy atmosphere. The place has dark furniture, modern art and dimmed lights. \n\nThis is a tapas and martini bar, meaning that this restaurant only offers appetizers and drinks. Their menu consists of salads, lots of seafood and meat options. Their drinks are really good; I had their mojito and it was delish! Also, they have a nice selection of desserts. They had tiramisu, Portuguese cheeses, cr\u00e8me caramel, raspberry sorbet and more. Their desserts range from $7-$12.\n\nI had their chicken breast which was around $12. The chicken was made to perfection and it tasted really good. However, even for an appetizer portion it was rather small- about a few bite sizes. \n\nOverall, this is a nice tapas restaurant with great tasting food. However, it is overpriced for the potions that you receive."
9,3 stars,Poor airport layout. Never enough phone charging stations for the amount of people flying through here. Not super clean. Busy always. Wear your walking shoes. Also be prepared for disgruntled workers at the restaurants. 3 stars for location close to the Strip.


## 预处理数据

下载数据集到本地后，使用 Tokenizer 来处理文本，对于长度不等的输入数据，可以使用填充（padding）和截断（truncation）策略来处理。

Datasets 的 `map` 方法，支持一次性在整个数据集上应用预处理函数。

下面使用填充到最大长度的策略，处理整个数据集：

In [8]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")


def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)


tokenized_datasets = dataset.map(tokenize_function, batched=True)

In [9]:
show_random_elements(tokenized_datasets["train"], num_examples=1)

Unnamed: 0,label,text,input_ids,token_type_ids,attention_mask
0,1 star,Horrible sushi.... service was fine but will not be back!,"[101, 9800, 27788, 28117, 5933, 119, 119, 119, 119, 1555, 1108, 2503, 1133, 1209, 1136, 1129, 1171, 106, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]"


### 数据抽样

使用 1000 个数据样本，在 BERT 上演示小规模训练（基于 Pytorch Trainer）

`shuffle()`函数会随机重新排列列的值。如果您希望对用于洗牌数据集的算法有更多控制，可以在此函数中指定generator参数来使用不同的numpy.random.Generator。

In [10]:
#change with full data set
#small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
#small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
full_train_dataset = tokenized_datasets["train"].shuffle(seed=42)
full_eval_dataset = tokenized_datasets["test"].shuffle(seed=42)


## 微调训练配置

### 加载 BERT 模型

警告通知我们正在丢弃一些权重（`vocab_transform` 和 `vocab_layer_norm` 层），并随机初始化其他一些权重（`pre_classifier` 和 `classifier` 层）。在微调模型情况下是绝对正常的，因为我们正在删除用于预训练模型的掩码语言建模任务的头部，并用一个新的头部替换它，对于这个新头部，我们没有预训练的权重，所以库会警告我们在用它进行推理之前应该对这个模型进行微调，而这正是我们要做的事情。

In [11]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### 训练超参数（TrainingArguments）

完整配置参数与默认值：https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments

源代码定义：https://github.com/huggingface/transformers/blob/v4.36.1/src/transformers/training_args.py#L161

**最重要配置：模型权重保存路径(output_dir)**

In [12]:
from transformers import TrainingArguments

model_dir = "models/bert-base-cased-finetune-yelp"

# logging_steps 默认值为500，根据我们的训练数据和步长，将其设置为100
training_args = TrainingArguments(output_dir=model_dir,
                                  per_device_train_batch_size=16,
                                  num_train_epochs=5,
                                  logging_steps=100)

In [13]:
# 完整的超参数配置
print(training_args)

TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=IntervalStrategy.NO,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_re

### 训练过程中的指标评估（Evaluate)

**[Hugging Face Evaluate 库](https://huggingface.co/docs/evaluate/index)** 支持使用一行代码，获得数十种不同领域（自然语言处理、计算机视觉、强化学习等）的评估方法。 当前支持 **完整评估指标：https://huggingface.co/evaluate-metric**

训练器（Trainer）在训练过程中不会自动评估模型性能。因此，我们需要向训练器传递一个函数来计算和报告指标。 

Evaluate库提供了一个简单的准确率函数，您可以使用`evaluate.load`函数加载

In [14]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")

Using the latest cached version of the module from /Users/apple/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Wed Dec 20 22:12:18 2023) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.



接着，调用 `compute` 函数来计算预测的准确率。

在将预测传递给 compute 函数之前，我们需要将 logits 转换为预测值（**所有Transformers 模型都返回 logits**）。

In [15]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

#### 训练过程指标监控

通常，为了监控训练过程中的评估指标变化，我们可以在`TrainingArguments`指定`evaluation_strategy`参数，以便在 epoch 结束时报告评估指标。

In [16]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir=model_dir,
                                  evaluation_strategy="epoch", 
                                  per_device_train_batch_size=16,
                                  num_train_epochs=3,
                                  logging_steps=30)

## 开始训练

### 实例化训练器（Trainer）

`kernel version` 版本问题：暂不影响本示例代码运行

In [17]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=full_train_dataset,
    eval_dataset=full_eval_dataset,
    compute_metrics=compute_metrics,
)

## 使用 nvidia-smi 查看 GPU 使用

为了实时查看GPU使用情况，可以使用 `watch` 指令实现轮询：`watch -n 1 nvidia-smi`:

```shell
Every 1.0s: nvidia-smi                                                   Wed Dec 20 14:37:41 2023

Wed Dec 20 14:37:41 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:0D.0 Off |                    0 |
| N/A   64C    P0              69W /  70W |   6665MiB / 15360MiB |     98%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     18395      C   /root/miniconda3/bin/python                6660MiB |
+---------------------------------------------------------------------------------------+
```

In [18]:
trainer.train()

  0%|          | 30/121875 [00:34<37:55:20,  1.12s/it]

{'loss': 1.5949, 'learning_rate': 4.998769230769231e-05, 'epoch': 0.0}


  0%|          | 60/121875 [01:08<37:37:19,  1.11s/it]

{'loss': 1.3109, 'learning_rate': 4.997538461538462e-05, 'epoch': 0.0}


  0%|          | 90/121875 [01:41<37:33:15,  1.11s/it]

{'loss': 1.2402, 'learning_rate': 4.9963076923076926e-05, 'epoch': 0.0}


  0%|          | 120/121875 [02:14<37:35:31,  1.11s/it]

{'loss': 1.1752, 'learning_rate': 4.995076923076923e-05, 'epoch': 0.0}


  0%|          | 150/121875 [02:48<37:27:40,  1.11s/it]

{'loss': 1.0552, 'learning_rate': 4.993846153846154e-05, 'epoch': 0.0}


  0%|          | 180/121875 [03:21<37:31:46,  1.11s/it]

{'loss': 1.09, 'learning_rate': 4.992615384615385e-05, 'epoch': 0.0}


  0%|          | 210/121875 [03:54<36:40:26,  1.09s/it]

{'loss': 1.1678, 'learning_rate': 4.9913846153846156e-05, 'epoch': 0.01}


  0%|          | 240/121875 [04:26<36:47:40,  1.09s/it]

{'loss': 1.0615, 'learning_rate': 4.990153846153847e-05, 'epoch': 0.01}


  0%|          | 270/121875 [04:59<37:22:08,  1.11s/it]

{'loss': 0.991, 'learning_rate': 4.9889230769230774e-05, 'epoch': 0.01}


  0%|          | 300/121875 [05:32<37:36:15,  1.11s/it]

{'loss': 1.0596, 'learning_rate': 4.987692307692308e-05, 'epoch': 0.01}


  0%|          | 330/121875 [06:05<37:48:53,  1.12s/it]

{'loss': 1.0138, 'learning_rate': 4.9864615384615386e-05, 'epoch': 0.01}


  0%|          | 360/121875 [06:39<37:20:26,  1.11s/it]

{'loss': 1.0583, 'learning_rate': 4.985230769230769e-05, 'epoch': 0.01}


  0%|          | 390/121875 [07:12<37:28:38,  1.11s/it]

{'loss': 1.0042, 'learning_rate': 4.9840000000000004e-05, 'epoch': 0.01}


  0%|          | 420/121875 [07:45<37:16:02,  1.10s/it]

{'loss': 1.0733, 'learning_rate': 4.982769230769231e-05, 'epoch': 0.01}


  0%|          | 450/121875 [08:18<37:28:12,  1.11s/it]

{'loss': 1.0313, 'learning_rate': 4.981538461538462e-05, 'epoch': 0.01}


  0%|          | 480/121875 [08:52<37:22:51,  1.11s/it]

{'loss': 0.986, 'learning_rate': 4.980307692307693e-05, 'epoch': 0.01}


  0%|          | 510/121875 [09:27<38:19:43,  1.14s/it]

{'loss': 0.9699, 'learning_rate': 4.9790769230769234e-05, 'epoch': 0.01}


  0%|          | 540/121875 [10:01<37:17:34,  1.11s/it]

{'loss': 0.9605, 'learning_rate': 4.977846153846154e-05, 'epoch': 0.01}


  0%|          | 570/121875 [10:34<37:24:51,  1.11s/it]

{'loss': 0.9519, 'learning_rate': 4.976615384615385e-05, 'epoch': 0.01}


  0%|          | 600/121875 [11:07<37:16:25,  1.11s/it]

{'loss': 1.0489, 'learning_rate': 4.975384615384616e-05, 'epoch': 0.01}


  1%|          | 630/121875 [11:40<37:08:35,  1.10s/it]

{'loss': 0.9411, 'learning_rate': 4.974153846153846e-05, 'epoch': 0.02}


  1%|          | 660/121875 [12:14<37:26:52,  1.11s/it]

{'loss': 0.9921, 'learning_rate': 4.9729230769230776e-05, 'epoch': 0.02}


  1%|          | 690/121875 [12:47<37:21:30,  1.11s/it]

{'loss': 0.9655, 'learning_rate': 4.9716923076923075e-05, 'epoch': 0.02}


  1%|          | 720/121875 [13:20<36:55:25,  1.10s/it]

{'loss': 0.9697, 'learning_rate': 4.970461538461539e-05, 'epoch': 0.02}


  1%|          | 750/121875 [13:53<37:07:43,  1.10s/it]

{'loss': 0.9986, 'learning_rate': 4.969230769230769e-05, 'epoch': 0.02}


  1%|          | 780/121875 [14:26<37:16:12,  1.11s/it]

{'loss': 1.0164, 'learning_rate': 4.9680000000000005e-05, 'epoch': 0.02}


  1%|          | 810/121875 [15:00<37:02:48,  1.10s/it]

{'loss': 1.0309, 'learning_rate': 4.966769230769231e-05, 'epoch': 0.02}


  1%|          | 840/121875 [15:33<37:05:26,  1.10s/it]

{'loss': 0.9995, 'learning_rate': 4.965538461538462e-05, 'epoch': 0.02}


  1%|          | 870/121875 [16:06<37:19:53,  1.11s/it]

{'loss': 0.9565, 'learning_rate': 4.964307692307692e-05, 'epoch': 0.02}


  1%|          | 900/121875 [16:39<37:06:15,  1.10s/it]

{'loss': 0.9139, 'learning_rate': 4.963076923076923e-05, 'epoch': 0.02}


  1%|          | 930/121875 [17:12<37:04:36,  1.10s/it]

{'loss': 0.9941, 'learning_rate': 4.961846153846154e-05, 'epoch': 0.02}


  1%|          | 960/121875 [17:45<37:08:30,  1.11s/it]

{'loss': 0.9283, 'learning_rate': 4.9606153846153847e-05, 'epoch': 0.02}


  1%|          | 990/121875 [18:18<37:15:23,  1.11s/it]

{'loss': 0.9899, 'learning_rate': 4.959384615384616e-05, 'epoch': 0.02}


  1%|          | 1020/121875 [18:54<37:02:36,  1.10s/it]

{'loss': 0.9276, 'learning_rate': 4.9581538461538465e-05, 'epoch': 0.03}


  1%|          | 1050/121875 [19:27<37:12:19,  1.11s/it]

{'loss': 0.9666, 'learning_rate': 4.956923076923077e-05, 'epoch': 0.03}


  1%|          | 1080/121875 [20:00<37:08:08,  1.11s/it]

{'loss': 0.9668, 'learning_rate': 4.9556923076923076e-05, 'epoch': 0.03}


  1%|          | 1110/121875 [20:34<37:13:31,  1.11s/it]

{'loss': 0.9597, 'learning_rate': 4.954461538461539e-05, 'epoch': 0.03}


  1%|          | 1140/121875 [21:07<37:05:17,  1.11s/it]

{'loss': 0.9135, 'learning_rate': 4.9532307692307694e-05, 'epoch': 0.03}


  1%|          | 1170/121875 [21:40<37:00:32,  1.10s/it]

{'loss': 0.9101, 'learning_rate': 4.952e-05, 'epoch': 0.03}


  1%|          | 1200/121875 [22:13<37:07:14,  1.11s/it]

{'loss': 0.9261, 'learning_rate': 4.950769230769231e-05, 'epoch': 0.03}


  1%|          | 1230/121875 [22:46<36:59:12,  1.10s/it]

{'loss': 0.9766, 'learning_rate': 4.949538461538462e-05, 'epoch': 0.03}


  1%|          | 1260/121875 [23:20<37:41:31,  1.12s/it]

{'loss': 1.0087, 'learning_rate': 4.9483076923076924e-05, 'epoch': 0.03}


  1%|          | 1290/121875 [23:54<37:50:43,  1.13s/it]

{'loss': 0.9554, 'learning_rate': 4.947076923076923e-05, 'epoch': 0.03}


  1%|          | 1320/121875 [24:28<37:44:21,  1.13s/it]

{'loss': 1.0137, 'learning_rate': 4.945846153846154e-05, 'epoch': 0.03}


  1%|          | 1350/121875 [25:01<37:48:39,  1.13s/it]

{'loss': 0.9294, 'learning_rate': 4.944615384615385e-05, 'epoch': 0.03}


  1%|          | 1380/121875 [25:35<37:38:04,  1.12s/it]

{'loss': 0.9123, 'learning_rate': 4.943384615384616e-05, 'epoch': 0.03}


  1%|          | 1410/121875 [26:09<37:39:24,  1.13s/it]

{'loss': 0.9205, 'learning_rate': 4.9421538461538466e-05, 'epoch': 0.03}


  1%|          | 1440/121875 [26:43<37:29:33,  1.12s/it]

{'loss': 0.8981, 'learning_rate': 4.940923076923077e-05, 'epoch': 0.04}


  1%|          | 1470/121875 [27:16<37:42:21,  1.13s/it]

{'loss': 0.9646, 'learning_rate': 4.939692307692308e-05, 'epoch': 0.04}


  1%|          | 1500/121875 [27:50<37:32:51,  1.12s/it]

{'loss': 0.8889, 'learning_rate': 4.9384615384615384e-05, 'epoch': 0.04}


  1%|▏         | 1530/121875 [28:26<37:45:31,  1.13s/it]

{'loss': 0.8818, 'learning_rate': 4.9372307692307696e-05, 'epoch': 0.04}


  1%|▏         | 1560/121875 [29:00<37:42:38,  1.13s/it]

{'loss': 0.9377, 'learning_rate': 4.936e-05, 'epoch': 0.04}


  1%|▏         | 1590/121875 [29:34<37:40:49,  1.13s/it]

{'loss': 0.9599, 'learning_rate': 4.9347692307692314e-05, 'epoch': 0.04}


  1%|▏         | 1620/121875 [30:08<37:42:11,  1.13s/it]

{'loss': 0.9568, 'learning_rate': 4.933538461538462e-05, 'epoch': 0.04}


  1%|▏         | 1650/121875 [30:42<37:37:40,  1.13s/it]

{'loss': 0.9589, 'learning_rate': 4.9323076923076926e-05, 'epoch': 0.04}


  1%|▏         | 1680/121875 [31:16<37:33:57,  1.13s/it]

{'loss': 0.8828, 'learning_rate': 4.931076923076923e-05, 'epoch': 0.04}


  1%|▏         | 1710/121875 [31:50<37:38:26,  1.13s/it]

{'loss': 0.9452, 'learning_rate': 4.9298461538461544e-05, 'epoch': 0.04}


  1%|▏         | 1740/121875 [32:24<37:45:30,  1.13s/it]

{'loss': 0.9404, 'learning_rate': 4.928615384615385e-05, 'epoch': 0.04}


  1%|▏         | 1770/121875 [32:57<37:31:38,  1.12s/it]

{'loss': 0.9225, 'learning_rate': 4.9273846153846155e-05, 'epoch': 0.04}


  1%|▏         | 1800/121875 [33:31<37:37:16,  1.13s/it]

{'loss': 0.8643, 'learning_rate': 4.926153846153847e-05, 'epoch': 0.04}


  2%|▏         | 1830/121875 [34:05<37:42:05,  1.13s/it]

{'loss': 0.939, 'learning_rate': 4.924923076923077e-05, 'epoch': 0.05}


  2%|▏         | 1860/121875 [34:39<37:32:55,  1.13s/it]

{'loss': 0.846, 'learning_rate': 4.923692307692308e-05, 'epoch': 0.05}


  2%|▏         | 1890/121875 [35:13<37:32:45,  1.13s/it]

{'loss': 0.8787, 'learning_rate': 4.9224615384615385e-05, 'epoch': 0.05}


  2%|▏         | 1920/121875 [35:46<37:48:31,  1.13s/it]

{'loss': 0.9194, 'learning_rate': 4.92123076923077e-05, 'epoch': 0.05}


  2%|▏         | 1950/121875 [36:20<37:23:12,  1.12s/it]

{'loss': 0.8767, 'learning_rate': 4.92e-05, 'epoch': 0.05}


  2%|▏         | 1980/121875 [36:54<37:39:24,  1.13s/it]

{'loss': 0.9272, 'learning_rate': 4.9187692307692316e-05, 'epoch': 0.05}


  2%|▏         | 2010/121875 [37:30<38:25:34,  1.15s/it]

{'loss': 0.9546, 'learning_rate': 4.9175384615384615e-05, 'epoch': 0.05}


  2%|▏         | 2040/121875 [38:04<37:33:28,  1.13s/it]

{'loss': 0.9589, 'learning_rate': 4.916307692307692e-05, 'epoch': 0.05}


  2%|▏         | 2070/121875 [38:38<37:28:04,  1.13s/it]

{'loss': 0.9142, 'learning_rate': 4.915076923076923e-05, 'epoch': 0.05}


  2%|▏         | 2100/121875 [39:11<37:19:04,  1.12s/it]

{'loss': 0.9779, 'learning_rate': 4.913846153846154e-05, 'epoch': 0.05}


  2%|▏         | 2130/121875 [39:45<37:31:46,  1.13s/it]

{'loss': 0.9089, 'learning_rate': 4.912615384615385e-05, 'epoch': 0.05}


  2%|▏         | 2160/121875 [40:19<37:36:09,  1.13s/it]

{'loss': 0.9228, 'learning_rate': 4.911384615384616e-05, 'epoch': 0.05}


  2%|▏         | 2190/121875 [40:53<37:20:01,  1.12s/it]

{'loss': 0.9269, 'learning_rate': 4.910153846153846e-05, 'epoch': 0.05}


  2%|▏         | 2220/121875 [41:27<37:28:31,  1.13s/it]

{'loss': 0.9481, 'learning_rate': 4.908923076923077e-05, 'epoch': 0.05}


  2%|▏         | 2250/121875 [42:00<37:19:47,  1.12s/it]

{'loss': 1.0004, 'learning_rate': 4.907692307692308e-05, 'epoch': 0.06}


  2%|▏         | 2280/121875 [42:34<37:25:50,  1.13s/it]

{'loss': 0.8567, 'learning_rate': 4.9064615384615387e-05, 'epoch': 0.06}


  2%|▏         | 2310/121875 [43:08<37:18:14,  1.12s/it]

{'loss': 0.8416, 'learning_rate': 4.90523076923077e-05, 'epoch': 0.06}


  2%|▏         | 2340/121875 [43:41<36:29:43,  1.10s/it]

{'loss': 0.8914, 'learning_rate': 4.9040000000000005e-05, 'epoch': 0.06}


  2%|▏         | 2370/121875 [44:14<36:30:45,  1.10s/it]

{'loss': 0.8676, 'learning_rate': 4.902769230769231e-05, 'epoch': 0.06}


  2%|▏         | 2400/121875 [44:47<36:46:10,  1.11s/it]

{'loss': 0.9346, 'learning_rate': 4.9015384615384616e-05, 'epoch': 0.06}


  2%|▏         | 2430/121875 [45:21<37:17:06,  1.12s/it]

{'loss': 0.912, 'learning_rate': 4.900307692307692e-05, 'epoch': 0.06}


  2%|▏         | 2460/121875 [45:55<37:14:58,  1.12s/it]

{'loss': 0.8909, 'learning_rate': 4.8990769230769234e-05, 'epoch': 0.06}


  2%|▏         | 2490/121875 [46:28<37:09:15,  1.12s/it]

{'loss': 0.8877, 'learning_rate': 4.897846153846154e-05, 'epoch': 0.06}


  2%|▏         | 2520/121875 [47:04<37:22:36,  1.13s/it]

{'loss': 0.8508, 'learning_rate': 4.896615384615385e-05, 'epoch': 0.06}


  2%|▏         | 2550/121875 [47:38<36:59:18,  1.12s/it]

{'loss': 0.9985, 'learning_rate': 4.895384615384616e-05, 'epoch': 0.06}


  2%|▏         | 2580/121875 [48:11<36:33:24,  1.10s/it]

{'loss': 0.9463, 'learning_rate': 4.8941538461538464e-05, 'epoch': 0.06}


  2%|▏         | 2610/121875 [48:44<36:29:10,  1.10s/it]

{'loss': 0.9255, 'learning_rate': 4.892923076923077e-05, 'epoch': 0.06}


  2%|▏         | 2640/121875 [49:18<36:34:16,  1.10s/it]

{'loss': 0.8555, 'learning_rate': 4.8916923076923076e-05, 'epoch': 0.06}


  2%|▏         | 2670/121875 [49:51<36:30:30,  1.10s/it]

{'loss': 0.8829, 'learning_rate': 4.890461538461539e-05, 'epoch': 0.07}


  2%|▏         | 2700/121875 [50:24<36:36:34,  1.11s/it]

{'loss': 0.8623, 'learning_rate': 4.8892307692307694e-05, 'epoch': 0.07}


  2%|▏         | 2730/121875 [50:57<36:29:26,  1.10s/it]

{'loss': 0.9438, 'learning_rate': 4.8880000000000006e-05, 'epoch': 0.07}


  2%|▏         | 2760/121875 [51:30<36:37:16,  1.11s/it]

{'loss': 0.8925, 'learning_rate': 4.886769230769231e-05, 'epoch': 0.07}


  2%|▏         | 2790/121875 [52:03<36:37:31,  1.11s/it]

{'loss': 0.8352, 'learning_rate': 4.885538461538462e-05, 'epoch': 0.07}


  2%|▏         | 2820/121875 [52:36<36:35:56,  1.11s/it]

{'loss': 0.8913, 'learning_rate': 4.8843076923076924e-05, 'epoch': 0.07}


  2%|▏         | 2850/121875 [53:10<36:27:11,  1.10s/it]

{'loss': 0.9082, 'learning_rate': 4.8830769230769236e-05, 'epoch': 0.07}


  2%|▏         | 2880/121875 [53:43<36:32:06,  1.11s/it]

{'loss': 0.9207, 'learning_rate': 4.881846153846154e-05, 'epoch': 0.07}


  2%|▏         | 2910/121875 [54:16<36:21:18,  1.10s/it]

{'loss': 0.8901, 'learning_rate': 4.880615384615385e-05, 'epoch': 0.07}


  2%|▏         | 2940/121875 [54:49<36:29:11,  1.10s/it]

{'loss': 0.9215, 'learning_rate': 4.879384615384616e-05, 'epoch': 0.07}


  2%|▏         | 2970/121875 [55:22<36:34:35,  1.11s/it]

{'loss': 0.9044, 'learning_rate': 4.878153846153846e-05, 'epoch': 0.07}


  2%|▏         | 3000/121875 [55:55<36:31:46,  1.11s/it]

{'loss': 0.8594, 'learning_rate': 4.876923076923077e-05, 'epoch': 0.07}


  2%|▏         | 3030/121875 [56:31<36:16:18,  1.10s/it]

{'loss': 0.8457, 'learning_rate': 4.875692307692308e-05, 'epoch': 0.07}


  3%|▎         | 3060/121875 [57:04<36:34:29,  1.11s/it]

{'loss': 0.9053, 'learning_rate': 4.874461538461539e-05, 'epoch': 0.08}


  3%|▎         | 3090/121875 [57:37<36:18:46,  1.10s/it]

{'loss': 0.9457, 'learning_rate': 4.8732307692307695e-05, 'epoch': 0.08}


  3%|▎         | 3120/121875 [58:10<36:19:43,  1.10s/it]

{'loss': 0.862, 'learning_rate': 4.872000000000001e-05, 'epoch': 0.08}


  3%|▎         | 3150/121875 [58:43<36:25:16,  1.10s/it]

{'loss': 0.9197, 'learning_rate': 4.870769230769231e-05, 'epoch': 0.08}


  3%|▎         | 3180/121875 [59:16<36:20:30,  1.10s/it]

{'loss': 0.8627, 'learning_rate': 4.869538461538462e-05, 'epoch': 0.08}


  3%|▎         | 3210/121875 [59:49<36:12:17,  1.10s/it]

{'loss': 0.8376, 'learning_rate': 4.8683076923076925e-05, 'epoch': 0.08}


  3%|▎         | 3240/121875 [1:00:22<36:32:12,  1.11s/it]

{'loss': 0.9275, 'learning_rate': 4.867076923076923e-05, 'epoch': 0.08}


  3%|▎         | 3270/121875 [1:00:55<36:15:27,  1.10s/it]

{'loss': 0.9021, 'learning_rate': 4.865846153846154e-05, 'epoch': 0.08}


  3%|▎         | 3300/121875 [1:01:28<36:21:13,  1.10s/it]

{'loss': 0.9497, 'learning_rate': 4.864615384615385e-05, 'epoch': 0.08}


  3%|▎         | 3330/121875 [1:02:02<36:23:11,  1.10s/it]

{'loss': 0.9561, 'learning_rate': 4.8633846153846155e-05, 'epoch': 0.08}


  3%|▎         | 3360/121875 [1:02:35<36:16:21,  1.10s/it]

{'loss': 0.8401, 'learning_rate': 4.862153846153846e-05, 'epoch': 0.08}


  3%|▎         | 3390/121875 [1:03:08<36:13:04,  1.10s/it]

{'loss': 0.9392, 'learning_rate': 4.860923076923077e-05, 'epoch': 0.08}


  3%|▎         | 3420/121875 [1:03:40<35:25:17,  1.08s/it]

{'loss': 0.8016, 'learning_rate': 4.859692307692308e-05, 'epoch': 0.08}


  3%|▎         | 3450/121875 [1:04:13<35:26:50,  1.08s/it]

{'loss': 0.834, 'learning_rate': 4.858461538461539e-05, 'epoch': 0.08}


  3%|▎         | 3480/121875 [1:04:45<36:05:44,  1.10s/it]

{'loss': 0.9092, 'learning_rate': 4.85723076923077e-05, 'epoch': 0.09}


  3%|▎         | 3510/121875 [1:05:21<37:04:50,  1.13s/it]

{'loss': 0.9486, 'learning_rate': 4.856e-05, 'epoch': 0.09}


  3%|▎         | 3540/121875 [1:05:54<36:18:50,  1.10s/it]

{'loss': 0.9214, 'learning_rate': 4.854769230769231e-05, 'epoch': 0.09}


  3%|▎         | 3570/121875 [1:06:27<36:15:31,  1.10s/it]

{'loss': 0.8873, 'learning_rate': 4.8535384615384614e-05, 'epoch': 0.09}


  3%|▎         | 3600/121875 [1:07:00<36:10:05,  1.10s/it]

{'loss': 0.9213, 'learning_rate': 4.8523076923076927e-05, 'epoch': 0.09}


  3%|▎         | 3630/121875 [1:07:33<36:10:32,  1.10s/it]

{'loss': 0.8437, 'learning_rate': 4.851076923076923e-05, 'epoch': 0.09}


  3%|▎         | 3660/121875 [1:08:06<36:04:31,  1.10s/it]

{'loss': 0.898, 'learning_rate': 4.8498461538461545e-05, 'epoch': 0.09}


  3%|▎         | 3690/121875 [1:08:39<36:17:46,  1.11s/it]

{'loss': 0.829, 'learning_rate': 4.848615384615385e-05, 'epoch': 0.09}


  3%|▎         | 3720/121875 [1:09:12<36:05:56,  1.10s/it]

{'loss': 0.8683, 'learning_rate': 4.8473846153846156e-05, 'epoch': 0.09}


  3%|▎         | 3750/121875 [1:09:45<36:23:04,  1.11s/it]

{'loss': 0.8417, 'learning_rate': 4.846153846153846e-05, 'epoch': 0.09}


  3%|▎         | 3780/121875 [1:10:18<36:03:39,  1.10s/it]

{'loss': 0.8441, 'learning_rate': 4.844923076923077e-05, 'epoch': 0.09}


  3%|▎         | 3810/121875 [1:10:51<36:05:12,  1.10s/it]

{'loss': 0.9269, 'learning_rate': 4.843692307692308e-05, 'epoch': 0.09}


  3%|▎         | 3840/121875 [1:11:25<36:13:26,  1.10s/it]

{'loss': 0.8568, 'learning_rate': 4.8424615384615386e-05, 'epoch': 0.09}


  3%|▎         | 3870/121875 [1:11:58<36:03:23,  1.10s/it]

{'loss': 0.8981, 'learning_rate': 4.84123076923077e-05, 'epoch': 0.1}


  3%|▎         | 3900/121875 [1:12:31<36:12:40,  1.10s/it]

{'loss': 0.8777, 'learning_rate': 4.8400000000000004e-05, 'epoch': 0.1}


  3%|▎         | 3930/121875 [1:13:04<36:03:47,  1.10s/it]

{'loss': 0.9073, 'learning_rate': 4.838769230769231e-05, 'epoch': 0.1}


  3%|▎         | 3960/121875 [1:13:37<36:13:18,  1.11s/it]

{'loss': 0.864, 'learning_rate': 4.8375384615384616e-05, 'epoch': 0.1}


  3%|▎         | 3990/121875 [1:14:10<35:57:58,  1.10s/it]

{'loss': 0.9476, 'learning_rate': 4.836307692307693e-05, 'epoch': 0.1}


  3%|▎         | 4020/121875 [1:14:45<36:03:38,  1.10s/it]

{'loss': 0.8266, 'learning_rate': 4.8350769230769234e-05, 'epoch': 0.1}


  3%|▎         | 4050/121875 [1:15:18<36:05:46,  1.10s/it]

{'loss': 0.8254, 'learning_rate': 4.833846153846154e-05, 'epoch': 0.1}


  3%|▎         | 4080/121875 [1:15:51<36:09:47,  1.11s/it]

{'loss': 0.8724, 'learning_rate': 4.832615384615385e-05, 'epoch': 0.1}


  3%|▎         | 4110/121875 [1:16:24<36:00:23,  1.10s/it]

{'loss': 0.8605, 'learning_rate': 4.831384615384615e-05, 'epoch': 0.1}


  3%|▎         | 4140/121875 [1:16:57<35:59:57,  1.10s/it]

{'loss': 0.9046, 'learning_rate': 4.8301538461538464e-05, 'epoch': 0.1}


  3%|▎         | 4170/121875 [1:17:31<36:12:37,  1.11s/it]

{'loss': 0.8896, 'learning_rate': 4.828923076923077e-05, 'epoch': 0.1}


  3%|▎         | 4200/121875 [1:18:04<36:00:07,  1.10s/it]

{'loss': 0.9613, 'learning_rate': 4.827692307692308e-05, 'epoch': 0.1}


  3%|▎         | 4230/121875 [1:18:37<36:00:58,  1.10s/it]

{'loss': 0.8948, 'learning_rate': 4.826461538461539e-05, 'epoch': 0.1}


  3%|▎         | 4260/121875 [1:19:10<36:09:54,  1.11s/it]

{'loss': 0.8304, 'learning_rate': 4.82523076923077e-05, 'epoch': 0.1}


  4%|▎         | 4290/121875 [1:19:43<36:06:08,  1.11s/it]

{'loss': 0.8294, 'learning_rate': 4.824e-05, 'epoch': 0.11}


  4%|▎         | 4320/121875 [1:20:16<35:56:00,  1.10s/it]

{'loss': 0.889, 'learning_rate': 4.822769230769231e-05, 'epoch': 0.11}


  4%|▎         | 4350/121875 [1:20:49<35:57:19,  1.10s/it]

{'loss': 0.8556, 'learning_rate': 4.821538461538462e-05, 'epoch': 0.11}


  4%|▎         | 4380/121875 [1:21:22<36:04:49,  1.11s/it]

{'loss': 0.9046, 'learning_rate': 4.820307692307692e-05, 'epoch': 0.11}


  4%|▎         | 4410/121875 [1:21:55<36:02:11,  1.10s/it]

{'loss': 0.8593, 'learning_rate': 4.8190769230769235e-05, 'epoch': 0.11}


  4%|▎         | 4440/121875 [1:22:28<35:46:50,  1.10s/it]

{'loss': 0.8464, 'learning_rate': 4.817846153846154e-05, 'epoch': 0.11}


  4%|▎         | 4470/121875 [1:23:01<35:57:26,  1.10s/it]

{'loss': 0.9009, 'learning_rate': 4.816615384615385e-05, 'epoch': 0.11}


  4%|▎         | 4500/121875 [1:23:34<35:53:14,  1.10s/it]

{'loss': 0.8268, 'learning_rate': 4.815384615384615e-05, 'epoch': 0.11}


  4%|▎         | 4530/121875 [1:24:10<36:03:41,  1.11s/it]

{'loss': 0.8415, 'learning_rate': 4.8141538461538465e-05, 'epoch': 0.11}


  4%|▎         | 4560/121875 [1:24:43<35:51:40,  1.10s/it]

{'loss': 0.852, 'learning_rate': 4.812923076923077e-05, 'epoch': 0.11}


  4%|▍         | 4590/121875 [1:25:16<35:50:31,  1.10s/it]

{'loss': 0.8966, 'learning_rate': 4.811692307692308e-05, 'epoch': 0.11}


  4%|▍         | 4620/121875 [1:25:49<35:53:50,  1.10s/it]

{'loss': 0.7965, 'learning_rate': 4.810461538461539e-05, 'epoch': 0.11}


  4%|▍         | 4650/121875 [1:26:22<35:38:07,  1.09s/it]

{'loss': 0.8791, 'learning_rate': 4.8092307692307695e-05, 'epoch': 0.11}


  4%|▍         | 4680/121875 [1:26:55<35:44:32,  1.10s/it]

{'loss': 0.899, 'learning_rate': 4.808e-05, 'epoch': 0.12}


  4%|▍         | 4710/121875 [1:27:28<35:54:16,  1.10s/it]

{'loss': 0.9097, 'learning_rate': 4.8067692307692306e-05, 'epoch': 0.12}


  4%|▍         | 4740/121875 [1:28:01<35:53:01,  1.10s/it]

{'loss': 0.8012, 'learning_rate': 4.805538461538462e-05, 'epoch': 0.12}


  4%|▍         | 4770/121875 [1:28:34<36:00:10,  1.11s/it]

{'loss': 0.8623, 'learning_rate': 4.8043076923076924e-05, 'epoch': 0.12}


  4%|▍         | 4800/121875 [1:29:07<36:01:03,  1.11s/it]

{'loss': 0.8492, 'learning_rate': 4.803076923076924e-05, 'epoch': 0.12}


  4%|▍         | 4830/121875 [1:29:40<35:51:00,  1.10s/it]

{'loss': 0.9227, 'learning_rate': 4.801846153846154e-05, 'epoch': 0.12}


  4%|▍         | 4860/121875 [1:30:13<35:42:28,  1.10s/it]

{'loss': 0.8723, 'learning_rate': 4.800615384615385e-05, 'epoch': 0.12}


  4%|▍         | 4890/121875 [1:30:46<35:47:39,  1.10s/it]

{'loss': 0.8698, 'learning_rate': 4.7993846153846154e-05, 'epoch': 0.12}


  4%|▍         | 4920/121875 [1:31:19<35:41:53,  1.10s/it]

{'loss': 0.8298, 'learning_rate': 4.798153846153846e-05, 'epoch': 0.12}


  4%|▍         | 4950/121875 [1:31:53<35:52:19,  1.10s/it]

{'loss': 0.8515, 'learning_rate': 4.796923076923077e-05, 'epoch': 0.12}


  4%|▍         | 4980/121875 [1:32:26<35:51:51,  1.10s/it]

{'loss': 0.893, 'learning_rate': 4.795692307692308e-05, 'epoch': 0.12}


  4%|▍         | 5010/121875 [1:33:01<36:36:30,  1.13s/it]

{'loss': 0.8545, 'learning_rate': 4.794461538461539e-05, 'epoch': 0.12}


  4%|▍         | 5040/121875 [1:33:34<35:44:50,  1.10s/it]

{'loss': 0.8492, 'learning_rate': 4.7932307692307696e-05, 'epoch': 0.12}


  4%|▍         | 5070/121875 [1:34:07<35:49:43,  1.10s/it]

{'loss': 0.843, 'learning_rate': 4.792e-05, 'epoch': 0.12}


  4%|▍         | 5100/121875 [1:34:40<35:45:26,  1.10s/it]

{'loss': 0.9092, 'learning_rate': 4.790769230769231e-05, 'epoch': 0.13}


  4%|▍         | 5130/121875 [1:35:13<35:53:35,  1.11s/it]

{'loss': 0.7922, 'learning_rate': 4.789538461538462e-05, 'epoch': 0.13}


  4%|▍         | 5160/121875 [1:35:46<35:52:44,  1.11s/it]

{'loss': 0.8697, 'learning_rate': 4.7883076923076926e-05, 'epoch': 0.13}


  4%|▍         | 5190/121875 [1:36:19<35:44:27,  1.10s/it]

{'loss': 0.8556, 'learning_rate': 4.787076923076924e-05, 'epoch': 0.13}


  4%|▍         | 5220/121875 [1:36:53<36:02:50,  1.11s/it]

{'loss': 0.8367, 'learning_rate': 4.7858461538461544e-05, 'epoch': 0.13}


  4%|▍         | 5250/121875 [1:37:26<35:47:48,  1.10s/it]

{'loss': 0.8695, 'learning_rate': 4.784615384615384e-05, 'epoch': 0.13}


  4%|▍         | 5280/121875 [1:37:59<35:43:32,  1.10s/it]

{'loss': 0.7851, 'learning_rate': 4.7833846153846156e-05, 'epoch': 0.13}


  4%|▍         | 5310/121875 [1:38:32<35:45:02,  1.10s/it]

{'loss': 0.8322, 'learning_rate': 4.782153846153846e-05, 'epoch': 0.13}


  4%|▍         | 5340/121875 [1:39:05<35:48:49,  1.11s/it]

{'loss': 0.8108, 'learning_rate': 4.7809230769230774e-05, 'epoch': 0.13}


  4%|▍         | 5370/121875 [1:39:38<35:33:59,  1.10s/it]

{'loss': 0.821, 'learning_rate': 4.779692307692308e-05, 'epoch': 0.13}


  4%|▍         | 5400/121875 [1:40:11<35:45:18,  1.11s/it]

{'loss': 0.8533, 'learning_rate': 4.778461538461539e-05, 'epoch': 0.13}


  4%|▍         | 5430/121875 [1:40:44<35:45:38,  1.11s/it]

{'loss': 0.8986, 'learning_rate': 4.777230769230769e-05, 'epoch': 0.13}


  4%|▍         | 5460/121875 [1:41:18<35:48:34,  1.11s/it]

{'loss': 0.8384, 'learning_rate': 4.7760000000000004e-05, 'epoch': 0.13}


  5%|▍         | 5490/121875 [1:41:51<35:33:25,  1.10s/it]

{'loss': 0.9377, 'learning_rate': 4.774769230769231e-05, 'epoch': 0.14}


  5%|▍         | 5520/121875 [1:42:26<35:46:55,  1.11s/it]

{'loss': 0.9114, 'learning_rate': 4.7735384615384615e-05, 'epoch': 0.14}


  5%|▍         | 5550/121875 [1:42:59<35:30:29,  1.10s/it]

{'loss': 0.8497, 'learning_rate': 4.772307692307693e-05, 'epoch': 0.14}


  5%|▍         | 5580/121875 [1:43:32<35:34:30,  1.10s/it]

{'loss': 0.8539, 'learning_rate': 4.771076923076923e-05, 'epoch': 0.14}


  5%|▍         | 5610/121875 [1:44:05<35:33:26,  1.10s/it]

{'loss': 0.8424, 'learning_rate': 4.769846153846154e-05, 'epoch': 0.14}


  5%|▍         | 5640/121875 [1:44:38<35:29:39,  1.10s/it]

{'loss': 0.78, 'learning_rate': 4.7686153846153845e-05, 'epoch': 0.14}


  5%|▍         | 5670/121875 [1:45:11<35:37:19,  1.10s/it]

{'loss': 0.8667, 'learning_rate': 4.767384615384616e-05, 'epoch': 0.14}


  5%|▍         | 5700/121875 [1:45:44<35:26:17,  1.10s/it]

{'loss': 0.8981, 'learning_rate': 4.766153846153846e-05, 'epoch': 0.14}


  5%|▍         | 5730/121875 [1:46:17<35:34:29,  1.10s/it]

{'loss': 0.9009, 'learning_rate': 4.7649230769230775e-05, 'epoch': 0.14}


  5%|▍         | 5760/121875 [1:46:50<35:23:04,  1.10s/it]

{'loss': 0.8912, 'learning_rate': 4.763692307692308e-05, 'epoch': 0.14}


  5%|▍         | 5790/121875 [1:47:23<35:23:03,  1.10s/it]

{'loss': 0.8917, 'learning_rate': 4.762461538461539e-05, 'epoch': 0.14}


  5%|▍         | 5820/121875 [1:47:56<35:31:21,  1.10s/it]

{'loss': 0.84, 'learning_rate': 4.761230769230769e-05, 'epoch': 0.14}


  5%|▍         | 5850/121875 [1:48:29<35:26:58,  1.10s/it]

{'loss': 0.8028, 'learning_rate': 4.76e-05, 'epoch': 0.14}


  5%|▍         | 5880/121875 [1:49:03<35:38:23,  1.11s/it]

{'loss': 0.8643, 'learning_rate': 4.758769230769231e-05, 'epoch': 0.14}


  5%|▍         | 5910/121875 [1:49:36<35:25:30,  1.10s/it]

{'loss': 0.817, 'learning_rate': 4.7575384615384616e-05, 'epoch': 0.15}


  5%|▍         | 5940/121875 [1:50:09<35:32:35,  1.10s/it]

{'loss': 0.8866, 'learning_rate': 4.756307692307693e-05, 'epoch': 0.15}


  5%|▍         | 5970/121875 [1:50:42<35:24:45,  1.10s/it]

{'loss': 0.8335, 'learning_rate': 4.7550769230769235e-05, 'epoch': 0.15}


  5%|▍         | 6000/121875 [1:51:15<35:31:34,  1.10s/it]

{'loss': 0.8226, 'learning_rate': 4.753846153846154e-05, 'epoch': 0.15}


  5%|▍         | 6030/121875 [1:51:50<35:31:35,  1.10s/it]

{'loss': 0.8244, 'learning_rate': 4.7526153846153846e-05, 'epoch': 0.15}


  5%|▍         | 6060/121875 [1:52:23<35:38:53,  1.11s/it]

{'loss': 0.8303, 'learning_rate': 4.751384615384616e-05, 'epoch': 0.15}


  5%|▍         | 6090/121875 [1:52:56<35:24:36,  1.10s/it]

{'loss': 0.8648, 'learning_rate': 4.7501538461538464e-05, 'epoch': 0.15}


  5%|▌         | 6120/121875 [1:53:29<35:22:06,  1.10s/it]

{'loss': 0.8796, 'learning_rate': 4.748923076923077e-05, 'epoch': 0.15}


  5%|▌         | 6150/121875 [1:54:02<35:33:38,  1.11s/it]

{'loss': 0.8919, 'learning_rate': 4.747692307692308e-05, 'epoch': 0.15}


  5%|▌         | 6180/121875 [1:54:35<35:20:26,  1.10s/it]

{'loss': 0.8606, 'learning_rate': 4.746461538461539e-05, 'epoch': 0.15}


  5%|▌         | 6210/121875 [1:55:08<35:29:27,  1.10s/it]

{'loss': 0.7949, 'learning_rate': 4.7452307692307694e-05, 'epoch': 0.15}


  5%|▌         | 6240/121875 [1:55:41<35:22:24,  1.10s/it]

{'loss': 0.8321, 'learning_rate': 4.744e-05, 'epoch': 0.15}


  5%|▌         | 6270/121875 [1:56:14<35:33:43,  1.11s/it]

{'loss': 0.844, 'learning_rate': 4.742769230769231e-05, 'epoch': 0.15}


  5%|▌         | 6300/121875 [1:56:48<35:21:28,  1.10s/it]

{'loss': 0.783, 'learning_rate': 4.741538461538462e-05, 'epoch': 0.16}


  5%|▌         | 6330/121875 [1:57:21<35:22:17,  1.10s/it]

{'loss': 0.8162, 'learning_rate': 4.740307692307693e-05, 'epoch': 0.16}


  5%|▌         | 6360/121875 [1:57:54<35:30:32,  1.11s/it]

{'loss': 0.8753, 'learning_rate': 4.7390769230769236e-05, 'epoch': 0.16}


  5%|▌         | 6390/121875 [1:58:27<35:39:52,  1.11s/it]

{'loss': 0.8165, 'learning_rate': 4.7378461538461535e-05, 'epoch': 0.16}


  5%|▌         | 6420/121875 [1:59:00<35:13:49,  1.10s/it]

{'loss': 0.8869, 'learning_rate': 4.736615384615385e-05, 'epoch': 0.16}


  5%|▌         | 6450/121875 [1:59:33<35:19:03,  1.10s/it]

{'loss': 0.8889, 'learning_rate': 4.7353846153846153e-05, 'epoch': 0.16}


  5%|▌         | 6480/121875 [2:00:06<35:17:40,  1.10s/it]

{'loss': 0.8158, 'learning_rate': 4.7341538461538466e-05, 'epoch': 0.16}


  5%|▌         | 6510/121875 [2:00:41<36:15:47,  1.13s/it]

{'loss': 0.9056, 'learning_rate': 4.732923076923077e-05, 'epoch': 0.16}


  5%|▌         | 6540/121875 [2:01:14<35:18:49,  1.10s/it]

{'loss': 0.8757, 'learning_rate': 4.7316923076923084e-05, 'epoch': 0.16}


  5%|▌         | 6570/121875 [2:01:47<35:15:03,  1.10s/it]

{'loss': 0.8869, 'learning_rate': 4.730461538461538e-05, 'epoch': 0.16}


  5%|▌         | 6600/121875 [2:02:20<35:19:44,  1.10s/it]

{'loss': 0.7544, 'learning_rate': 4.7292307692307696e-05, 'epoch': 0.16}


  5%|▌         | 6630/121875 [2:02:53<35:19:16,  1.10s/it]

{'loss': 0.9045, 'learning_rate': 4.728e-05, 'epoch': 0.16}


  5%|▌         | 6660/121875 [2:03:27<35:03:35,  1.10s/it]

{'loss': 0.8329, 'learning_rate': 4.726769230769231e-05, 'epoch': 0.16}


  5%|▌         | 6690/121875 [2:04:00<35:07:09,  1.10s/it]

{'loss': 0.8221, 'learning_rate': 4.725538461538462e-05, 'epoch': 0.16}


  6%|▌         | 6720/121875 [2:04:33<35:12:49,  1.10s/it]

{'loss': 0.8445, 'learning_rate': 4.7243076923076925e-05, 'epoch': 0.17}


  6%|▌         | 6750/121875 [2:05:06<35:21:41,  1.11s/it]

{'loss': 0.8936, 'learning_rate': 4.723076923076923e-05, 'epoch': 0.17}


  6%|▌         | 6780/121875 [2:05:39<35:24:20,  1.11s/it]

{'loss': 0.8628, 'learning_rate': 4.721846153846154e-05, 'epoch': 0.17}


  6%|▌         | 6810/121875 [2:06:12<35:15:36,  1.10s/it]

{'loss': 0.8639, 'learning_rate': 4.720615384615385e-05, 'epoch': 0.17}


  6%|▌         | 6840/121875 [2:06:45<35:16:13,  1.10s/it]

{'loss': 0.8755, 'learning_rate': 4.7193846153846155e-05, 'epoch': 0.17}


  6%|▌         | 6870/121875 [2:07:18<35:16:34,  1.10s/it]

{'loss': 0.8444, 'learning_rate': 4.718153846153847e-05, 'epoch': 0.17}


  6%|▌         | 6900/121875 [2:07:51<35:03:11,  1.10s/it]

{'loss': 0.8422, 'learning_rate': 4.716923076923077e-05, 'epoch': 0.17}


  6%|▌         | 6930/121875 [2:08:24<35:08:27,  1.10s/it]

{'loss': 0.8966, 'learning_rate': 4.715692307692308e-05, 'epoch': 0.17}


  6%|▌         | 6960/121875 [2:08:57<35:19:58,  1.11s/it]

{'loss': 0.9063, 'learning_rate': 4.7144615384615385e-05, 'epoch': 0.17}


  6%|▌         | 6990/121875 [2:09:30<35:07:43,  1.10s/it]

{'loss': 0.8482, 'learning_rate': 4.713230769230769e-05, 'epoch': 0.17}


  6%|▌         | 7020/121875 [2:10:05<35:12:49,  1.10s/it]

{'loss': 0.8198, 'learning_rate': 4.712e-05, 'epoch': 0.17}


  6%|▌         | 7050/121875 [2:10:38<34:56:37,  1.10s/it]

{'loss': 0.8342, 'learning_rate': 4.710769230769231e-05, 'epoch': 0.17}


  6%|▌         | 7080/121875 [2:11:11<35:01:30,  1.10s/it]

{'loss': 0.8432, 'learning_rate': 4.709538461538462e-05, 'epoch': 0.17}


  6%|▌         | 7110/121875 [2:11:44<35:06:34,  1.10s/it]

{'loss': 0.8442, 'learning_rate': 4.708307692307693e-05, 'epoch': 0.18}


  6%|▌         | 7140/121875 [2:12:17<35:01:36,  1.10s/it]

{'loss': 0.9239, 'learning_rate': 4.707076923076923e-05, 'epoch': 0.18}


  6%|▌         | 7170/121875 [2:12:50<35:06:19,  1.10s/it]

{'loss': 0.8464, 'learning_rate': 4.705846153846154e-05, 'epoch': 0.18}


  6%|▌         | 7200/121875 [2:13:23<35:03:30,  1.10s/it]

{'loss': 0.8192, 'learning_rate': 4.704615384615385e-05, 'epoch': 0.18}


  6%|▌         | 7230/121875 [2:13:57<35:13:50,  1.11s/it]

{'loss': 0.8546, 'learning_rate': 4.7033846153846156e-05, 'epoch': 0.18}


  6%|▌         | 7260/121875 [2:14:30<35:00:33,  1.10s/it]

{'loss': 0.8919, 'learning_rate': 4.702153846153846e-05, 'epoch': 0.18}


  6%|▌         | 7290/121875 [2:15:03<35:12:40,  1.11s/it]

{'loss': 0.768, 'learning_rate': 4.7009230769230775e-05, 'epoch': 0.18}


  6%|▌         | 7320/121875 [2:15:36<35:04:18,  1.10s/it]

{'loss': 0.7955, 'learning_rate': 4.699692307692308e-05, 'epoch': 0.18}


  6%|▌         | 7350/121875 [2:16:09<34:56:46,  1.10s/it]

{'loss': 0.8474, 'learning_rate': 4.6984615384615386e-05, 'epoch': 0.18}


  6%|▌         | 7380/121875 [2:16:42<35:02:27,  1.10s/it]

{'loss': 0.8957, 'learning_rate': 4.697230769230769e-05, 'epoch': 0.18}


  6%|▌         | 7410/121875 [2:17:15<35:01:59,  1.10s/it]

{'loss': 0.776, 'learning_rate': 4.6960000000000004e-05, 'epoch': 0.18}


  6%|▌         | 7440/121875 [2:17:48<35:04:18,  1.10s/it]

{'loss': 0.8618, 'learning_rate': 4.694769230769231e-05, 'epoch': 0.18}


  6%|▌         | 7470/121875 [2:18:21<34:57:00,  1.10s/it]

{'loss': 0.9094, 'learning_rate': 4.693538461538462e-05, 'epoch': 0.18}


  6%|▌         | 7500/121875 [2:18:54<34:49:54,  1.10s/it]

{'loss': 0.8126, 'learning_rate': 4.692307692307693e-05, 'epoch': 0.18}


  6%|▌         | 7530/121875 [2:19:29<34:56:33,  1.10s/it]

{'loss': 0.8107, 'learning_rate': 4.691076923076923e-05, 'epoch': 0.19}


  6%|▌         | 7560/121875 [2:20:02<34:59:06,  1.10s/it]

{'loss': 0.8354, 'learning_rate': 4.689846153846154e-05, 'epoch': 0.19}


  6%|▌         | 7590/121875 [2:20:35<34:59:34,  1.10s/it]

{'loss': 0.806, 'learning_rate': 4.6886153846153846e-05, 'epoch': 0.19}


  6%|▋         | 7620/121875 [2:21:08<34:48:12,  1.10s/it]

{'loss': 0.9159, 'learning_rate': 4.687384615384616e-05, 'epoch': 0.19}


  6%|▋         | 7650/121875 [2:21:41<34:57:17,  1.10s/it]

{'loss': 0.9067, 'learning_rate': 4.6861538461538464e-05, 'epoch': 0.19}


  6%|▋         | 7680/121875 [2:22:14<34:55:34,  1.10s/it]

{'loss': 0.7903, 'learning_rate': 4.6849230769230776e-05, 'epoch': 0.19}


  6%|▋         | 7710/121875 [2:22:47<34:56:48,  1.10s/it]

{'loss': 0.7745, 'learning_rate': 4.6836923076923075e-05, 'epoch': 0.19}


  6%|▋         | 7740/121875 [2:23:20<34:43:28,  1.10s/it]

{'loss': 0.8281, 'learning_rate': 4.682461538461539e-05, 'epoch': 0.19}


  6%|▋         | 7770/121875 [2:23:54<34:48:34,  1.10s/it]

{'loss': 0.8278, 'learning_rate': 4.6812307692307693e-05, 'epoch': 0.19}


  6%|▋         | 7800/121875 [2:24:27<35:00:00,  1.10s/it]

{'loss': 0.8262, 'learning_rate': 4.6800000000000006e-05, 'epoch': 0.19}


  6%|▋         | 7830/121875 [2:25:00<35:00:14,  1.10s/it]

{'loss': 0.8964, 'learning_rate': 4.678769230769231e-05, 'epoch': 0.19}


  6%|▋         | 7860/121875 [2:25:33<34:44:49,  1.10s/it]

{'loss': 0.8785, 'learning_rate': 4.677538461538462e-05, 'epoch': 0.19}


  6%|▋         | 7890/121875 [2:26:06<34:49:35,  1.10s/it]

{'loss': 0.9068, 'learning_rate': 4.676307692307692e-05, 'epoch': 0.19}


  6%|▋         | 7920/121875 [2:26:39<34:51:41,  1.10s/it]

{'loss': 0.8333, 'learning_rate': 4.675076923076923e-05, 'epoch': 0.19}


  7%|▋         | 7950/121875 [2:27:12<34:50:59,  1.10s/it]

{'loss': 0.8472, 'learning_rate': 4.673846153846154e-05, 'epoch': 0.2}


  7%|▋         | 7980/121875 [2:27:45<34:59:04,  1.11s/it]

{'loss': 0.7902, 'learning_rate': 4.672615384615385e-05, 'epoch': 0.2}


  7%|▋         | 8010/121875 [2:28:20<35:37:38,  1.13s/it]

{'loss': 0.7912, 'learning_rate': 4.671384615384616e-05, 'epoch': 0.2}


  7%|▋         | 8040/121875 [2:28:53<34:46:24,  1.10s/it]

{'loss': 0.8254, 'learning_rate': 4.6701538461538465e-05, 'epoch': 0.2}


  7%|▋         | 8070/121875 [2:29:26<34:47:27,  1.10s/it]

{'loss': 0.839, 'learning_rate': 4.668923076923077e-05, 'epoch': 0.2}


  7%|▋         | 8100/121875 [2:29:59<34:48:35,  1.10s/it]

{'loss': 0.836, 'learning_rate': 4.667692307692308e-05, 'epoch': 0.2}


  7%|▋         | 8130/121875 [2:30:32<34:44:59,  1.10s/it]

{'loss': 0.8256, 'learning_rate': 4.666461538461538e-05, 'epoch': 0.2}


  7%|▋         | 8160/121875 [2:31:06<34:47:20,  1.10s/it]

{'loss': 0.7907, 'learning_rate': 4.6652307692307695e-05, 'epoch': 0.2}


  7%|▋         | 8190/121875 [2:31:39<34:46:42,  1.10s/it]

{'loss': 0.8728, 'learning_rate': 4.664e-05, 'epoch': 0.2}


  7%|▋         | 8220/121875 [2:32:12<34:41:32,  1.10s/it]

{'loss': 0.8066, 'learning_rate': 4.662769230769231e-05, 'epoch': 0.2}


  7%|▋         | 8250/121875 [2:32:45<34:46:40,  1.10s/it]

{'loss': 0.8136, 'learning_rate': 4.661538461538462e-05, 'epoch': 0.2}


  7%|▋         | 8280/121875 [2:33:18<34:44:44,  1.10s/it]

{'loss': 0.8511, 'learning_rate': 4.6603076923076925e-05, 'epoch': 0.2}


  7%|▋         | 8310/121875 [2:33:51<34:49:14,  1.10s/it]

{'loss': 0.8313, 'learning_rate': 4.659076923076923e-05, 'epoch': 0.2}


  7%|▋         | 8340/121875 [2:34:24<34:38:27,  1.10s/it]

{'loss': 0.836, 'learning_rate': 4.657846153846154e-05, 'epoch': 0.21}


  7%|▋         | 8370/121875 [2:34:57<34:44:57,  1.10s/it]

{'loss': 0.756, 'learning_rate': 4.656615384615385e-05, 'epoch': 0.21}


  7%|▋         | 8400/121875 [2:35:30<34:32:11,  1.10s/it]

{'loss': 0.7733, 'learning_rate': 4.6553846153846154e-05, 'epoch': 0.21}


  7%|▋         | 8430/121875 [2:36:03<34:42:47,  1.10s/it]

{'loss': 0.8208, 'learning_rate': 4.654153846153847e-05, 'epoch': 0.21}


  7%|▋         | 8460/121875 [2:36:36<34:36:57,  1.10s/it]

{'loss': 0.8795, 'learning_rate': 4.652923076923077e-05, 'epoch': 0.21}


  7%|▋         | 8490/121875 [2:37:09<34:30:46,  1.10s/it]

{'loss': 0.8808, 'learning_rate': 4.651692307692308e-05, 'epoch': 0.21}


  7%|▋         | 8520/121875 [2:37:44<34:40:17,  1.10s/it]

{'loss': 0.8014, 'learning_rate': 4.6504615384615384e-05, 'epoch': 0.21}


  7%|▋         | 8550/121875 [2:38:17<34:42:31,  1.10s/it]

{'loss': 0.851, 'learning_rate': 4.6492307692307696e-05, 'epoch': 0.21}


  7%|▋         | 8580/121875 [2:38:50<34:38:34,  1.10s/it]

{'loss': 0.8518, 'learning_rate': 4.648e-05, 'epoch': 0.21}


  7%|▋         | 8610/121875 [2:39:23<34:37:31,  1.10s/it]

{'loss': 0.8177, 'learning_rate': 4.6467692307692315e-05, 'epoch': 0.21}


  7%|▋         | 8640/121875 [2:39:56<34:35:33,  1.10s/it]

{'loss': 0.8429, 'learning_rate': 4.645538461538462e-05, 'epoch': 0.21}


  7%|▋         | 8670/121875 [2:40:29<34:27:38,  1.10s/it]

{'loss': 0.7716, 'learning_rate': 4.6443076923076926e-05, 'epoch': 0.21}


  7%|▋         | 8700/121875 [2:41:02<34:41:57,  1.10s/it]

{'loss': 0.8506, 'learning_rate': 4.643076923076923e-05, 'epoch': 0.21}


  7%|▋         | 8730/121875 [2:41:35<34:37:51,  1.10s/it]

{'loss': 0.8126, 'learning_rate': 4.641846153846154e-05, 'epoch': 0.21}


  7%|▋         | 8760/121875 [2:42:08<34:30:17,  1.10s/it]

{'loss': 0.8542, 'learning_rate': 4.640615384615385e-05, 'epoch': 0.22}


  7%|▋         | 8790/121875 [2:42:41<34:31:04,  1.10s/it]

{'loss': 0.8822, 'learning_rate': 4.6393846153846156e-05, 'epoch': 0.22}


  7%|▋         | 8820/121875 [2:43:14<34:34:24,  1.10s/it]

{'loss': 0.7533, 'learning_rate': 4.638153846153847e-05, 'epoch': 0.22}


  7%|▋         | 8850/121875 [2:43:47<34:32:23,  1.10s/it]

{'loss': 0.8774, 'learning_rate': 4.636923076923077e-05, 'epoch': 0.22}


  7%|▋         | 8880/121875 [2:44:20<34:38:49,  1.10s/it]

{'loss': 0.8236, 'learning_rate': 4.635692307692308e-05, 'epoch': 0.22}


  7%|▋         | 8910/121875 [2:44:53<34:29:27,  1.10s/it]

{'loss': 0.8347, 'learning_rate': 4.6344615384615386e-05, 'epoch': 0.22}


  7%|▋         | 8940/121875 [2:45:26<34:31:35,  1.10s/it]

{'loss': 0.8395, 'learning_rate': 4.63323076923077e-05, 'epoch': 0.22}


  7%|▋         | 8970/121875 [2:45:59<34:25:49,  1.10s/it]

{'loss': 0.8659, 'learning_rate': 4.6320000000000004e-05, 'epoch': 0.22}


  7%|▋         | 9000/121875 [2:46:32<34:26:34,  1.10s/it]

{'loss': 0.7729, 'learning_rate': 4.630769230769231e-05, 'epoch': 0.22}


  7%|▋         | 9030/121875 [2:47:07<34:21:26,  1.10s/it]

{'loss': 0.8661, 'learning_rate': 4.6295384615384615e-05, 'epoch': 0.22}


  7%|▋         | 9060/121875 [2:47:40<34:22:46,  1.10s/it]

{'loss': 0.8441, 'learning_rate': 4.628307692307692e-05, 'epoch': 0.22}


  7%|▋         | 9090/121875 [2:48:13<34:30:19,  1.10s/it]

{'loss': 0.8377, 'learning_rate': 4.6270769230769233e-05, 'epoch': 0.22}


  7%|▋         | 9120/121875 [2:48:46<34:23:27,  1.10s/it]

{'loss': 0.831, 'learning_rate': 4.625846153846154e-05, 'epoch': 0.22}


  8%|▊         | 9150/121875 [2:49:19<34:29:59,  1.10s/it]

{'loss': 0.842, 'learning_rate': 4.624615384615385e-05, 'epoch': 0.23}


  8%|▊         | 9180/121875 [2:49:52<34:26:53,  1.10s/it]

{'loss': 0.7945, 'learning_rate': 4.623384615384616e-05, 'epoch': 0.23}


  8%|▊         | 9210/121875 [2:50:25<34:29:03,  1.10s/it]

{'loss': 0.8857, 'learning_rate': 4.622153846153846e-05, 'epoch': 0.23}


  8%|▊         | 9240/121875 [2:50:58<34:20:28,  1.10s/it]

{'loss': 0.7899, 'learning_rate': 4.620923076923077e-05, 'epoch': 0.23}


  8%|▊         | 9270/121875 [2:51:31<34:15:50,  1.10s/it]

{'loss': 0.9014, 'learning_rate': 4.6196923076923075e-05, 'epoch': 0.23}


  8%|▊         | 9300/121875 [2:52:04<34:19:55,  1.10s/it]

{'loss': 0.8041, 'learning_rate': 4.618461538461539e-05, 'epoch': 0.23}


  8%|▊         | 9330/121875 [2:52:37<34:23:48,  1.10s/it]

{'loss': 0.8876, 'learning_rate': 4.617230769230769e-05, 'epoch': 0.23}


  8%|▊         | 9360/121875 [2:53:10<34:22:57,  1.10s/it]

{'loss': 0.8472, 'learning_rate': 4.6160000000000005e-05, 'epoch': 0.23}


  8%|▊         | 9390/121875 [2:53:43<34:19:26,  1.10s/it]

{'loss': 0.8372, 'learning_rate': 4.614769230769231e-05, 'epoch': 0.23}


  8%|▊         | 9420/121875 [2:54:16<34:29:35,  1.10s/it]

{'loss': 0.7906, 'learning_rate': 4.613538461538462e-05, 'epoch': 0.23}


  8%|▊         | 9450/121875 [2:54:49<34:14:11,  1.10s/it]

{'loss': 0.8333, 'learning_rate': 4.612307692307692e-05, 'epoch': 0.23}


  8%|▊         | 9480/121875 [2:55:22<34:16:54,  1.10s/it]

{'loss': 0.8721, 'learning_rate': 4.6110769230769235e-05, 'epoch': 0.23}


  8%|▊         | 9510/121875 [2:55:58<35:06:41,  1.12s/it]

{'loss': 0.8453, 'learning_rate': 4.609846153846154e-05, 'epoch': 0.23}


  8%|▊         | 9540/121875 [2:56:31<34:13:05,  1.10s/it]

{'loss': 0.8547, 'learning_rate': 4.608615384615385e-05, 'epoch': 0.23}


  8%|▊         | 9570/121875 [2:57:04<34:22:08,  1.10s/it]

{'loss': 0.8435, 'learning_rate': 4.607384615384616e-05, 'epoch': 0.24}


  8%|▊         | 9600/121875 [2:57:37<34:22:03,  1.10s/it]

{'loss': 0.7874, 'learning_rate': 4.6061538461538465e-05, 'epoch': 0.24}


  8%|▊         | 9630/121875 [2:58:10<34:18:59,  1.10s/it]

{'loss': 0.7603, 'learning_rate': 4.604923076923077e-05, 'epoch': 0.24}


  8%|▊         | 9660/121875 [2:58:43<34:25:29,  1.10s/it]

{'loss': 0.8457, 'learning_rate': 4.6036923076923076e-05, 'epoch': 0.24}


  8%|▊         | 9690/121875 [2:59:16<34:18:18,  1.10s/it]

{'loss': 0.8206, 'learning_rate': 4.602461538461539e-05, 'epoch': 0.24}


  8%|▊         | 9720/121875 [2:59:49<34:23:16,  1.10s/it]

{'loss': 0.8276, 'learning_rate': 4.6012307692307694e-05, 'epoch': 0.24}


  8%|▊         | 9750/121875 [3:00:22<34:22:33,  1.10s/it]

{'loss': 0.8328, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.24}


  8%|▊         | 9780/121875 [3:00:55<34:12:22,  1.10s/it]

{'loss': 0.8472, 'learning_rate': 4.598769230769231e-05, 'epoch': 0.24}


  8%|▊         | 9810/121875 [3:01:28<34:14:11,  1.10s/it]

{'loss': 0.8134, 'learning_rate': 4.597538461538462e-05, 'epoch': 0.24}


  8%|▊         | 9840/121875 [3:02:01<34:14:44,  1.10s/it]

{'loss': 0.8096, 'learning_rate': 4.5963076923076924e-05, 'epoch': 0.24}


  8%|▊         | 9870/121875 [3:02:33<34:05:56,  1.10s/it]

{'loss': 0.8561, 'learning_rate': 4.595076923076923e-05, 'epoch': 0.24}


  8%|▊         | 9900/121875 [3:03:06<34:14:23,  1.10s/it]

{'loss': 0.8047, 'learning_rate': 4.593846153846154e-05, 'epoch': 0.24}


  8%|▊         | 9930/121875 [3:03:39<34:11:40,  1.10s/it]

{'loss': 0.8307, 'learning_rate': 4.592615384615385e-05, 'epoch': 0.24}


  8%|▊         | 9960/121875 [3:04:12<34:14:51,  1.10s/it]

{'loss': 0.8766, 'learning_rate': 4.591384615384616e-05, 'epoch': 0.25}


  8%|▊         | 9990/121875 [3:04:45<34:14:24,  1.10s/it]

{'loss': 0.8773, 'learning_rate': 4.590153846153846e-05, 'epoch': 0.25}


  8%|▊         | 10020/121875 [3:05:21<34:17:05,  1.10s/it]

{'loss': 0.8829, 'learning_rate': 4.588923076923077e-05, 'epoch': 0.25}


  8%|▊         | 10050/121875 [3:05:54<34:10:18,  1.10s/it]

{'loss': 0.7458, 'learning_rate': 4.587692307692308e-05, 'epoch': 0.25}


  8%|▊         | 10080/121875 [3:06:27<34:08:00,  1.10s/it]

{'loss': 0.8473, 'learning_rate': 4.586461538461539e-05, 'epoch': 0.25}


  8%|▊         | 10110/121875 [3:07:00<34:09:02,  1.10s/it]

{'loss': 0.8243, 'learning_rate': 4.5852307692307696e-05, 'epoch': 0.25}


  8%|▊         | 10140/121875 [3:07:33<34:12:42,  1.10s/it]

{'loss': 0.7635, 'learning_rate': 4.584e-05, 'epoch': 0.25}


  8%|▊         | 10170/121875 [3:08:06<34:12:53,  1.10s/it]

{'loss': 0.8027, 'learning_rate': 4.582769230769231e-05, 'epoch': 0.25}


  8%|▊         | 10200/121875 [3:08:39<34:05:01,  1.10s/it]

{'loss': 0.8492, 'learning_rate': 4.581538461538461e-05, 'epoch': 0.25}


  8%|▊         | 10230/121875 [3:09:12<34:07:54,  1.10s/it]

{'loss': 0.8401, 'learning_rate': 4.5803076923076926e-05, 'epoch': 0.25}


  8%|▊         | 10260/121875 [3:09:45<34:06:10,  1.10s/it]

{'loss': 0.8516, 'learning_rate': 4.579076923076923e-05, 'epoch': 0.25}


  8%|▊         | 10290/121875 [3:10:18<34:10:17,  1.10s/it]

{'loss': 0.8486, 'learning_rate': 4.5778461538461544e-05, 'epoch': 0.25}


  8%|▊         | 10320/121875 [3:10:51<34:05:22,  1.10s/it]

{'loss': 0.7952, 'learning_rate': 4.576615384615385e-05, 'epoch': 0.25}


  8%|▊         | 10350/121875 [3:11:24<34:03:45,  1.10s/it]

{'loss': 0.8327, 'learning_rate': 4.5753846153846155e-05, 'epoch': 0.25}


  9%|▊         | 10380/121875 [3:11:57<34:02:28,  1.10s/it]

{'loss': 0.8095, 'learning_rate': 4.574153846153846e-05, 'epoch': 0.26}


  9%|▊         | 10410/121875 [3:12:30<34:00:24,  1.10s/it]

{'loss': 0.8576, 'learning_rate': 4.572923076923077e-05, 'epoch': 0.26}


  9%|▊         | 10440/121875 [3:13:03<34:01:26,  1.10s/it]

{'loss': 0.8736, 'learning_rate': 4.571692307692308e-05, 'epoch': 0.26}


  9%|▊         | 10470/121875 [3:13:36<34:00:51,  1.10s/it]

{'loss': 0.8009, 'learning_rate': 4.5704615384615385e-05, 'epoch': 0.26}


  9%|▊         | 10500/121875 [3:14:09<34:07:28,  1.10s/it]

{'loss': 0.8354, 'learning_rate': 4.56923076923077e-05, 'epoch': 0.26}


  9%|▊         | 10530/121875 [3:14:44<34:02:53,  1.10s/it]

{'loss': 0.8421, 'learning_rate': 4.568e-05, 'epoch': 0.26}


  9%|▊         | 10560/121875 [3:15:17<33:52:17,  1.10s/it]

{'loss': 0.8342, 'learning_rate': 4.566769230769231e-05, 'epoch': 0.26}


  9%|▊         | 10590/121875 [3:15:50<34:02:53,  1.10s/it]

{'loss': 0.8443, 'learning_rate': 4.5655384615384615e-05, 'epoch': 0.26}


  9%|▊         | 10620/121875 [3:16:23<33:54:30,  1.10s/it]

{'loss': 0.8224, 'learning_rate': 4.564307692307693e-05, 'epoch': 0.26}


  9%|▊         | 10650/121875 [3:16:56<33:53:59,  1.10s/it]

{'loss': 0.8292, 'learning_rate': 4.563076923076923e-05, 'epoch': 0.26}


  9%|▉         | 10680/121875 [3:17:29<33:48:23,  1.09s/it]

{'loss': 0.7842, 'learning_rate': 4.5618461538461545e-05, 'epoch': 0.26}


  9%|▉         | 10710/121875 [3:18:02<33:53:44,  1.10s/it]

{'loss': 0.811, 'learning_rate': 4.560615384615385e-05, 'epoch': 0.26}


  9%|▉         | 10740/121875 [3:18:35<33:47:17,  1.09s/it]

{'loss': 0.9049, 'learning_rate': 4.559384615384616e-05, 'epoch': 0.26}


  9%|▉         | 10770/121875 [3:19:08<33:41:25,  1.09s/it]

{'loss': 0.8128, 'learning_rate': 4.558153846153846e-05, 'epoch': 0.27}


  9%|▉         | 10800/121875 [3:19:41<33:58:59,  1.10s/it]

{'loss': 0.8095, 'learning_rate': 4.556923076923077e-05, 'epoch': 0.27}


  9%|▉         | 10830/121875 [3:20:13<33:53:02,  1.10s/it]

{'loss': 0.8177, 'learning_rate': 4.555692307692308e-05, 'epoch': 0.27}


  9%|▉         | 10860/121875 [3:20:47<33:49:14,  1.10s/it]

{'loss': 0.8831, 'learning_rate': 4.5544615384615386e-05, 'epoch': 0.27}


  9%|▉         | 10890/121875 [3:21:20<33:55:42,  1.10s/it]

{'loss': 0.8012, 'learning_rate': 4.55323076923077e-05, 'epoch': 0.27}


  9%|▉         | 10920/121875 [3:21:53<33:54:24,  1.10s/it]

{'loss': 0.8384, 'learning_rate': 4.5520000000000005e-05, 'epoch': 0.27}


  9%|▉         | 10950/121875 [3:22:26<33:56:43,  1.10s/it]

{'loss': 0.8061, 'learning_rate': 4.550769230769231e-05, 'epoch': 0.27}


  9%|▉         | 10980/121875 [3:22:58<34:00:08,  1.10s/it]

{'loss': 0.8175, 'learning_rate': 4.5495384615384616e-05, 'epoch': 0.27}


  9%|▉         | 11010/121875 [3:23:34<34:44:08,  1.13s/it]

{'loss': 0.8392, 'learning_rate': 4.548307692307692e-05, 'epoch': 0.27}


  9%|▉         | 11040/121875 [3:24:07<33:48:11,  1.10s/it]

{'loss': 0.8002, 'learning_rate': 4.5470769230769234e-05, 'epoch': 0.27}


  9%|▉         | 11070/121875 [3:24:40<34:03:34,  1.11s/it]

{'loss': 0.83, 'learning_rate': 4.545846153846154e-05, 'epoch': 0.27}


  9%|▉         | 11100/121875 [3:25:13<33:46:43,  1.10s/it]

{'loss': 0.7878, 'learning_rate': 4.544615384615385e-05, 'epoch': 0.27}


  9%|▉         | 11130/121875 [3:25:46<33:57:14,  1.10s/it]

{'loss': 0.8356, 'learning_rate': 4.543384615384615e-05, 'epoch': 0.27}


  9%|▉         | 11160/121875 [3:26:19<33:53:09,  1.10s/it]

{'loss': 0.805, 'learning_rate': 4.5421538461538464e-05, 'epoch': 0.27}


  9%|▉         | 11190/121875 [3:26:52<33:44:57,  1.10s/it]

{'loss': 0.8112, 'learning_rate': 4.540923076923077e-05, 'epoch': 0.28}


  9%|▉         | 11220/121875 [3:27:25<33:50:34,  1.10s/it]

{'loss': 0.8586, 'learning_rate': 4.539692307692308e-05, 'epoch': 0.28}


  9%|▉         | 11250/121875 [3:27:58<33:44:24,  1.10s/it]

{'loss': 0.8573, 'learning_rate': 4.538461538461539e-05, 'epoch': 0.28}


  9%|▉         | 11280/121875 [3:28:31<33:54:44,  1.10s/it]

{'loss': 0.795, 'learning_rate': 4.5372307692307694e-05, 'epoch': 0.28}


  9%|▉         | 11310/121875 [3:29:04<33:46:29,  1.10s/it]

{'loss': 0.8602, 'learning_rate': 4.536e-05, 'epoch': 0.28}


  9%|▉         | 11340/121875 [3:29:37<33:45:20,  1.10s/it]

{'loss': 0.828, 'learning_rate': 4.5347692307692305e-05, 'epoch': 0.28}


  9%|▉         | 11370/121875 [3:30:10<33:53:30,  1.10s/it]

{'loss': 0.8133, 'learning_rate': 4.533538461538462e-05, 'epoch': 0.28}


  9%|▉         | 11400/121875 [3:30:43<33:51:07,  1.10s/it]

{'loss': 0.9412, 'learning_rate': 4.532307692307692e-05, 'epoch': 0.28}


  9%|▉         | 11430/121875 [3:31:16<33:48:54,  1.10s/it]

{'loss': 0.8262, 'learning_rate': 4.5310769230769236e-05, 'epoch': 0.28}


  9%|▉         | 11460/121875 [3:31:49<33:45:15,  1.10s/it]

{'loss': 0.8169, 'learning_rate': 4.529846153846154e-05, 'epoch': 0.28}


  9%|▉         | 11490/121875 [3:32:22<33:45:43,  1.10s/it]

{'loss': 0.8672, 'learning_rate': 4.528615384615385e-05, 'epoch': 0.28}


  9%|▉         | 11520/121875 [3:32:57<33:43:35,  1.10s/it]

{'loss': 0.8229, 'learning_rate': 4.527384615384615e-05, 'epoch': 0.28}


  9%|▉         | 11550/121875 [3:33:30<33:40:49,  1.10s/it]

{'loss': 0.813, 'learning_rate': 4.5261538461538466e-05, 'epoch': 0.28}


 10%|▉         | 11580/121875 [3:34:03<33:41:04,  1.10s/it]

{'loss': 0.802, 'learning_rate': 4.524923076923077e-05, 'epoch': 0.29}


 10%|▉         | 11610/121875 [3:34:36<33:39:53,  1.10s/it]

{'loss': 0.827, 'learning_rate': 4.523692307692308e-05, 'epoch': 0.29}


 10%|▉         | 11640/121875 [3:35:09<33:45:53,  1.10s/it]

{'loss': 0.9023, 'learning_rate': 4.522461538461539e-05, 'epoch': 0.29}


 10%|▉         | 11670/121875 [3:35:42<33:47:49,  1.10s/it]

{'loss': 0.8851, 'learning_rate': 4.5212307692307695e-05, 'epoch': 0.29}


 10%|▉         | 11700/121875 [3:36:15<33:39:37,  1.10s/it]

{'loss': 0.8454, 'learning_rate': 4.52e-05, 'epoch': 0.29}


 10%|▉         | 11730/121875 [3:36:48<33:36:48,  1.10s/it]

{'loss': 0.8387, 'learning_rate': 4.518769230769231e-05, 'epoch': 0.29}


 10%|▉         | 11760/121875 [3:37:21<33:31:38,  1.10s/it]

{'loss': 0.7796, 'learning_rate': 4.517538461538462e-05, 'epoch': 0.29}


 10%|▉         | 11790/121875 [3:37:54<33:42:23,  1.10s/it]

{'loss': 0.8269, 'learning_rate': 4.5163076923076925e-05, 'epoch': 0.29}


 10%|▉         | 11820/121875 [3:38:27<33:37:18,  1.10s/it]

{'loss': 0.8764, 'learning_rate': 4.515076923076924e-05, 'epoch': 0.29}


 10%|▉         | 11850/121875 [3:39:00<33:42:23,  1.10s/it]

{'loss': 0.7824, 'learning_rate': 4.513846153846154e-05, 'epoch': 0.29}


 10%|▉         | 11880/121875 [3:39:33<33:36:38,  1.10s/it]

{'loss': 0.7892, 'learning_rate': 4.512615384615385e-05, 'epoch': 0.29}


 10%|▉         | 11910/121875 [3:40:06<33:24:01,  1.09s/it]

{'loss': 0.8301, 'learning_rate': 4.5113846153846155e-05, 'epoch': 0.29}


 10%|▉         | 11940/121875 [3:40:39<33:43:00,  1.10s/it]

{'loss': 0.8196, 'learning_rate': 4.510153846153846e-05, 'epoch': 0.29}


 10%|▉         | 11970/121875 [3:41:12<33:33:05,  1.10s/it]

{'loss': 0.8264, 'learning_rate': 4.508923076923077e-05, 'epoch': 0.29}


 10%|▉         | 12000/121875 [3:41:45<33:13:41,  1.09s/it]

{'loss': 0.8356, 'learning_rate': 4.507692307692308e-05, 'epoch': 0.3}


 10%|▉         | 12030/121875 [3:42:20<33:38:35,  1.10s/it]

{'loss': 0.7444, 'learning_rate': 4.506461538461539e-05, 'epoch': 0.3}


 10%|▉         | 12060/121875 [3:42:53<33:37:14,  1.10s/it]

{'loss': 0.8118, 'learning_rate': 4.50523076923077e-05, 'epoch': 0.3}


 10%|▉         | 12090/121875 [3:43:26<33:41:54,  1.11s/it]

{'loss': 0.8168, 'learning_rate': 4.504e-05, 'epoch': 0.3}


 10%|▉         | 12120/121875 [3:43:59<33:35:42,  1.10s/it]

{'loss': 0.7897, 'learning_rate': 4.502769230769231e-05, 'epoch': 0.3}


 10%|▉         | 12150/121875 [3:44:32<33:26:23,  1.10s/it]

{'loss': 0.8385, 'learning_rate': 4.5015384615384614e-05, 'epoch': 0.3}


 10%|▉         | 12180/121875 [3:45:05<33:22:44,  1.10s/it]

{'loss': 0.7706, 'learning_rate': 4.5003076923076926e-05, 'epoch': 0.3}


 10%|█         | 12210/121875 [3:45:38<33:30:34,  1.10s/it]

{'loss': 0.7766, 'learning_rate': 4.499076923076923e-05, 'epoch': 0.3}


 10%|█         | 12240/121875 [3:46:11<33:31:17,  1.10s/it]

{'loss': 0.8011, 'learning_rate': 4.4978461538461545e-05, 'epoch': 0.3}


 10%|█         | 12270/121875 [3:46:44<33:28:28,  1.10s/it]

{'loss': 0.8669, 'learning_rate': 4.4966153846153844e-05, 'epoch': 0.3}


 10%|█         | 12300/121875 [3:47:17<33:29:57,  1.10s/it]

{'loss': 0.7943, 'learning_rate': 4.4953846153846156e-05, 'epoch': 0.3}


 10%|█         | 12330/121875 [3:47:50<33:34:53,  1.10s/it]

{'loss': 0.8414, 'learning_rate': 4.494153846153846e-05, 'epoch': 0.3}


 10%|█         | 12360/121875 [3:48:23<33:27:50,  1.10s/it]

{'loss': 0.772, 'learning_rate': 4.4929230769230774e-05, 'epoch': 0.3}


 10%|█         | 12390/121875 [3:48:56<33:26:57,  1.10s/it]

{'loss': 0.8147, 'learning_rate': 4.491692307692308e-05, 'epoch': 0.3}


 10%|█         | 12420/121875 [3:49:29<33:25:31,  1.10s/it]

{'loss': 0.8054, 'learning_rate': 4.490461538461539e-05, 'epoch': 0.31}


 10%|█         | 12450/121875 [3:50:02<33:25:59,  1.10s/it]

{'loss': 0.7741, 'learning_rate': 4.489230769230769e-05, 'epoch': 0.31}


 10%|█         | 12480/121875 [3:50:35<33:29:47,  1.10s/it]

{'loss': 0.7765, 'learning_rate': 4.488e-05, 'epoch': 0.31}


 10%|█         | 12510/121875 [3:51:11<34:12:51,  1.13s/it]

{'loss': 0.8388, 'learning_rate': 4.486769230769231e-05, 'epoch': 0.31}


 10%|█         | 12540/121875 [3:51:44<33:25:14,  1.10s/it]

{'loss': 0.8445, 'learning_rate': 4.4855384615384615e-05, 'epoch': 0.31}


 10%|█         | 12570/121875 [3:52:17<33:25:49,  1.10s/it]

{'loss': 0.8257, 'learning_rate': 4.484307692307693e-05, 'epoch': 0.31}


 10%|█         | 12600/121875 [3:52:50<33:20:52,  1.10s/it]

{'loss': 0.8054, 'learning_rate': 4.4830769230769234e-05, 'epoch': 0.31}


 10%|█         | 12630/121875 [3:53:23<33:20:16,  1.10s/it]

{'loss': 0.8297, 'learning_rate': 4.481846153846154e-05, 'epoch': 0.31}


 10%|█         | 12660/121875 [3:53:56<33:07:06,  1.09s/it]

{'loss': 0.7973, 'learning_rate': 4.4806153846153845e-05, 'epoch': 0.31}


 10%|█         | 12690/121875 [3:54:29<33:22:12,  1.10s/it]

{'loss': 0.8264, 'learning_rate': 4.479384615384616e-05, 'epoch': 0.31}


 10%|█         | 12720/121875 [3:55:02<33:15:58,  1.10s/it]

{'loss': 0.8039, 'learning_rate': 4.478153846153846e-05, 'epoch': 0.31}


 10%|█         | 12750/121875 [3:55:35<33:21:33,  1.10s/it]

{'loss': 0.7966, 'learning_rate': 4.476923076923077e-05, 'epoch': 0.31}


 10%|█         | 12780/121875 [3:56:08<33:26:40,  1.10s/it]

{'loss': 0.854, 'learning_rate': 4.475692307692308e-05, 'epoch': 0.31}


 11%|█         | 12810/121875 [3:56:41<33:22:05,  1.10s/it]

{'loss': 0.855, 'learning_rate': 4.474461538461539e-05, 'epoch': 0.32}


 11%|█         | 12840/121875 [3:57:14<33:24:46,  1.10s/it]

{'loss': 0.8025, 'learning_rate': 4.473230769230769e-05, 'epoch': 0.32}


 11%|█         | 12870/121875 [3:57:47<33:19:54,  1.10s/it]

{'loss': 0.7951, 'learning_rate': 4.472e-05, 'epoch': 0.32}


 11%|█         | 12900/121875 [3:58:20<33:17:13,  1.10s/it]

{'loss': 0.8907, 'learning_rate': 4.470769230769231e-05, 'epoch': 0.32}


 11%|█         | 12930/121875 [3:58:53<33:13:44,  1.10s/it]

{'loss': 0.8407, 'learning_rate': 4.469538461538462e-05, 'epoch': 0.32}


 11%|█         | 12960/121875 [3:59:26<33:26:58,  1.11s/it]

{'loss': 0.8063, 'learning_rate': 4.468307692307693e-05, 'epoch': 0.32}


 11%|█         | 12990/121875 [3:59:59<33:19:56,  1.10s/it]

{'loss': 0.8183, 'learning_rate': 4.4670769230769235e-05, 'epoch': 0.32}


 11%|█         | 13020/121875 [4:00:34<33:14:14,  1.10s/it]

{'loss': 0.8294, 'learning_rate': 4.465846153846154e-05, 'epoch': 0.32}


 11%|█         | 13050/121875 [4:01:07<33:10:32,  1.10s/it]

{'loss': 0.8041, 'learning_rate': 4.464615384615385e-05, 'epoch': 0.32}


 11%|█         | 13080/121875 [4:01:40<33:21:58,  1.10s/it]

{'loss': 0.8424, 'learning_rate': 4.463384615384615e-05, 'epoch': 0.32}


 11%|█         | 13110/121875 [4:02:13<33:27:23,  1.11s/it]

{'loss': 0.823, 'learning_rate': 4.4621538461538465e-05, 'epoch': 0.32}


 11%|█         | 13140/121875 [4:02:46<33:10:56,  1.10s/it]

{'loss': 0.8113, 'learning_rate': 4.460923076923077e-05, 'epoch': 0.32}


 11%|█         | 13170/121875 [4:03:19<33:15:22,  1.10s/it]

{'loss': 0.8195, 'learning_rate': 4.459692307692308e-05, 'epoch': 0.32}


 11%|█         | 13200/121875 [4:03:52<33:15:25,  1.10s/it]

{'loss': 0.8352, 'learning_rate': 4.458461538461539e-05, 'epoch': 0.32}


 11%|█         | 13230/121875 [4:04:25<33:08:44,  1.10s/it]

{'loss': 0.7823, 'learning_rate': 4.4572307692307695e-05, 'epoch': 0.33}


 11%|█         | 13260/121875 [4:04:58<33:17:22,  1.10s/it]

{'loss': 0.8456, 'learning_rate': 4.456e-05, 'epoch': 0.33}


 11%|█         | 13290/121875 [4:05:31<32:58:25,  1.09s/it]

{'loss': 0.8327, 'learning_rate': 4.454769230769231e-05, 'epoch': 0.33}


 11%|█         | 13320/121875 [4:06:04<33:11:27,  1.10s/it]

{'loss': 0.8264, 'learning_rate': 4.453538461538462e-05, 'epoch': 0.33}


 11%|█         | 13350/121875 [4:06:37<33:30:27,  1.11s/it]

{'loss': 0.8061, 'learning_rate': 4.4523076923076924e-05, 'epoch': 0.33}


 11%|█         | 13380/121875 [4:07:10<33:14:03,  1.10s/it]

{'loss': 0.8265, 'learning_rate': 4.451076923076924e-05, 'epoch': 0.33}


 11%|█         | 13410/121875 [4:07:43<33:09:07,  1.10s/it]

{'loss': 0.8215, 'learning_rate': 4.4498461538461536e-05, 'epoch': 0.33}


 11%|█         | 13440/121875 [4:08:16<33:12:31,  1.10s/it]

{'loss': 0.816, 'learning_rate': 4.448615384615385e-05, 'epoch': 0.33}


 11%|█         | 13470/121875 [4:08:49<33:12:20,  1.10s/it]

{'loss': 0.7752, 'learning_rate': 4.4473846153846154e-05, 'epoch': 0.33}


 11%|█         | 13500/121875 [4:09:22<32:56:43,  1.09s/it]

{'loss': 0.8174, 'learning_rate': 4.4461538461538466e-05, 'epoch': 0.33}


 11%|█         | 13530/121875 [4:09:58<33:02:25,  1.10s/it]

{'loss': 0.8408, 'learning_rate': 4.444923076923077e-05, 'epoch': 0.33}


 11%|█         | 13560/121875 [4:10:31<33:15:06,  1.11s/it]

{'loss': 0.8033, 'learning_rate': 4.4436923076923085e-05, 'epoch': 0.33}


 11%|█         | 13590/121875 [4:11:04<33:06:22,  1.10s/it]

{'loss': 0.8326, 'learning_rate': 4.4424615384615384e-05, 'epoch': 0.33}


 11%|█         | 13620/121875 [4:11:37<32:57:24,  1.10s/it]

{'loss': 0.795, 'learning_rate': 4.441230769230769e-05, 'epoch': 0.34}


 11%|█         | 13650/121875 [4:12:10<32:58:14,  1.10s/it]

{'loss': 0.8064, 'learning_rate': 4.44e-05, 'epoch': 0.34}


 11%|█         | 13680/121875 [4:12:43<33:00:27,  1.10s/it]

{'loss': 0.8324, 'learning_rate': 4.438769230769231e-05, 'epoch': 0.34}


 11%|█         | 13710/121875 [4:13:16<33:05:44,  1.10s/it]

{'loss': 0.805, 'learning_rate': 4.437538461538462e-05, 'epoch': 0.34}


 11%|█▏        | 13740/121875 [4:13:49<33:04:50,  1.10s/it]

{'loss': 0.8453, 'learning_rate': 4.4363076923076926e-05, 'epoch': 0.34}


 11%|█▏        | 13770/121875 [4:14:22<32:57:43,  1.10s/it]

{'loss': 0.8323, 'learning_rate': 4.435076923076923e-05, 'epoch': 0.34}


 11%|█▏        | 13800/121875 [4:14:55<33:05:39,  1.10s/it]

{'loss': 0.7883, 'learning_rate': 4.433846153846154e-05, 'epoch': 0.34}


 11%|█▏        | 13830/121875 [4:15:28<32:55:43,  1.10s/it]

{'loss': 0.8238, 'learning_rate': 4.432615384615385e-05, 'epoch': 0.34}


 11%|█▏        | 13860/121875 [4:16:01<33:03:56,  1.10s/it]

{'loss': 0.7892, 'learning_rate': 4.4313846153846155e-05, 'epoch': 0.34}


 11%|█▏        | 13890/121875 [4:16:34<33:06:31,  1.10s/it]

{'loss': 0.7459, 'learning_rate': 4.430153846153846e-05, 'epoch': 0.34}


 11%|█▏        | 13920/121875 [4:17:07<33:02:11,  1.10s/it]

{'loss': 0.8532, 'learning_rate': 4.4289230769230774e-05, 'epoch': 0.34}


 11%|█▏        | 13950/121875 [4:17:40<32:53:19,  1.10s/it]

{'loss': 0.7542, 'learning_rate': 4.427692307692308e-05, 'epoch': 0.34}


 11%|█▏        | 13980/121875 [4:18:13<32:55:19,  1.10s/it]

{'loss': 0.828, 'learning_rate': 4.4264615384615385e-05, 'epoch': 0.34}


 11%|█▏        | 14010/121875 [4:18:48<33:45:14,  1.13s/it]

{'loss': 0.8237, 'learning_rate': 4.425230769230769e-05, 'epoch': 0.34}


 12%|█▏        | 14040/121875 [4:19:21<32:40:38,  1.09s/it]

{'loss': 0.8249, 'learning_rate': 4.424e-05, 'epoch': 0.35}


 12%|█▏        | 14070/121875 [4:19:54<32:56:43,  1.10s/it]

{'loss': 0.8366, 'learning_rate': 4.422769230769231e-05, 'epoch': 0.35}


 12%|█▏        | 14100/121875 [4:20:27<32:47:52,  1.10s/it]

{'loss': 0.8244, 'learning_rate': 4.421538461538462e-05, 'epoch': 0.35}


 12%|█▏        | 14130/121875 [4:21:00<32:56:31,  1.10s/it]

{'loss': 0.7685, 'learning_rate': 4.420307692307693e-05, 'epoch': 0.35}


 12%|█▏        | 14160/121875 [4:21:33<32:56:35,  1.10s/it]

{'loss': 0.8332, 'learning_rate': 4.419076923076923e-05, 'epoch': 0.35}


 12%|█▏        | 14190/121875 [4:22:06<32:50:25,  1.10s/it]

{'loss': 0.839, 'learning_rate': 4.417846153846154e-05, 'epoch': 0.35}


 12%|█▏        | 14220/121875 [4:22:39<32:57:00,  1.10s/it]

{'loss': 0.7677, 'learning_rate': 4.4166153846153844e-05, 'epoch': 0.35}


 12%|█▏        | 14250/121875 [4:23:12<32:40:29,  1.09s/it]

{'loss': 0.8565, 'learning_rate': 4.415384615384616e-05, 'epoch': 0.35}


 12%|█▏        | 14280/121875 [4:23:45<32:54:54,  1.10s/it]

{'loss': 0.8153, 'learning_rate': 4.414153846153846e-05, 'epoch': 0.35}


 12%|█▏        | 14310/121875 [4:24:18<32:47:31,  1.10s/it]

{'loss': 0.8546, 'learning_rate': 4.4129230769230775e-05, 'epoch': 0.35}


 12%|█▏        | 14340/121875 [4:24:51<32:54:40,  1.10s/it]

{'loss': 0.8198, 'learning_rate': 4.411692307692308e-05, 'epoch': 0.35}


 12%|█▏        | 14370/121875 [4:25:24<32:58:31,  1.10s/it]

{'loss': 0.8064, 'learning_rate': 4.410461538461539e-05, 'epoch': 0.35}


 12%|█▏        | 14400/121875 [4:25:57<32:49:23,  1.10s/it]

{'loss': 0.8011, 'learning_rate': 4.409230769230769e-05, 'epoch': 0.35}


 12%|█▏        | 14430/121875 [4:26:30<32:50:17,  1.10s/it]

{'loss': 0.8406, 'learning_rate': 4.4080000000000005e-05, 'epoch': 0.36}


 12%|█▏        | 14460/121875 [4:27:03<32:43:02,  1.10s/it]

{'loss': 0.8271, 'learning_rate': 4.406769230769231e-05, 'epoch': 0.36}


 12%|█▏        | 14490/121875 [4:27:36<32:49:37,  1.10s/it]

{'loss': 0.8112, 'learning_rate': 4.4055384615384616e-05, 'epoch': 0.36}


 12%|█▏        | 14520/121875 [4:28:12<32:45:50,  1.10s/it]

{'loss': 0.7894, 'learning_rate': 4.404307692307693e-05, 'epoch': 0.36}


 12%|█▏        | 14550/121875 [4:28:45<32:49:31,  1.10s/it]

{'loss': 0.8217, 'learning_rate': 4.403076923076923e-05, 'epoch': 0.36}


 12%|█▏        | 14580/121875 [4:29:18<32:40:01,  1.10s/it]

{'loss': 0.8409, 'learning_rate': 4.401846153846154e-05, 'epoch': 0.36}


 12%|█▏        | 14610/121875 [4:29:51<32:52:11,  1.10s/it]

{'loss': 0.7958, 'learning_rate': 4.4006153846153846e-05, 'epoch': 0.36}


 12%|█▏        | 14640/121875 [4:30:24<32:49:03,  1.10s/it]

{'loss': 0.7985, 'learning_rate': 4.399384615384616e-05, 'epoch': 0.36}


 12%|█▏        | 14670/121875 [4:30:57<32:41:54,  1.10s/it]

{'loss': 0.7856, 'learning_rate': 4.3981538461538464e-05, 'epoch': 0.36}


 12%|█▏        | 14700/121875 [4:31:30<32:46:02,  1.10s/it]

{'loss': 0.7825, 'learning_rate': 4.396923076923078e-05, 'epoch': 0.36}


 12%|█▏        | 14730/121875 [4:32:03<32:38:37,  1.10s/it]

{'loss': 0.8566, 'learning_rate': 4.3956923076923076e-05, 'epoch': 0.36}


 12%|█▏        | 14760/121875 [4:32:36<32:51:57,  1.10s/it]

{'loss': 0.8017, 'learning_rate': 4.394461538461538e-05, 'epoch': 0.36}


 12%|█▏        | 14790/121875 [4:33:09<32:34:58,  1.10s/it]

{'loss': 0.834, 'learning_rate': 4.3932307692307694e-05, 'epoch': 0.36}


 12%|█▏        | 14820/121875 [4:33:42<32:50:09,  1.10s/it]

{'loss': 0.8309, 'learning_rate': 4.392e-05, 'epoch': 0.36}


 12%|█▏        | 14850/121875 [4:34:15<32:36:25,  1.10s/it]

{'loss': 0.7674, 'learning_rate': 4.390769230769231e-05, 'epoch': 0.37}


 12%|█▏        | 14880/121875 [4:34:48<32:39:38,  1.10s/it]

{'loss': 0.7816, 'learning_rate': 4.389538461538462e-05, 'epoch': 0.37}


 12%|█▏        | 14910/121875 [4:35:21<32:46:02,  1.10s/it]

{'loss': 0.7591, 'learning_rate': 4.3883076923076924e-05, 'epoch': 0.37}


 12%|█▏        | 14940/121875 [4:35:54<32:42:34,  1.10s/it]

{'loss': 0.8522, 'learning_rate': 4.387076923076923e-05, 'epoch': 0.37}


 12%|█▏        | 14970/121875 [4:36:27<32:41:24,  1.10s/it]

{'loss': 0.7835, 'learning_rate': 4.385846153846154e-05, 'epoch': 0.37}


 12%|█▏        | 15000/121875 [4:37:00<32:44:39,  1.10s/it]

{'loss': 0.8159, 'learning_rate': 4.384615384615385e-05, 'epoch': 0.37}


 12%|█▏        | 15030/121875 [4:37:35<32:18:39,  1.09s/it]

{'loss': 0.7894, 'learning_rate': 4.383384615384616e-05, 'epoch': 0.37}


 12%|█▏        | 15060/121875 [4:38:08<32:23:50,  1.09s/it]

{'loss': 0.8267, 'learning_rate': 4.3821538461538466e-05, 'epoch': 0.37}


 12%|█▏        | 15090/121875 [4:38:41<32:41:58,  1.10s/it]

{'loss': 0.8561, 'learning_rate': 4.380923076923077e-05, 'epoch': 0.37}


 12%|█▏        | 15120/121875 [4:39:14<32:40:08,  1.10s/it]

{'loss': 0.7554, 'learning_rate': 4.379692307692308e-05, 'epoch': 0.37}


 12%|█▏        | 15150/121875 [4:39:47<32:36:52,  1.10s/it]

{'loss': 0.8185, 'learning_rate': 4.378461538461538e-05, 'epoch': 0.37}


 12%|█▏        | 15180/121875 [4:40:20<32:39:10,  1.10s/it]

{'loss': 0.8002, 'learning_rate': 4.3772307692307695e-05, 'epoch': 0.37}


 12%|█▏        | 15210/121875 [4:40:53<32:33:40,  1.10s/it]

{'loss': 0.8283, 'learning_rate': 4.376e-05, 'epoch': 0.37}


 13%|█▎        | 15240/121875 [4:41:26<32:36:27,  1.10s/it]

{'loss': 0.8169, 'learning_rate': 4.3747692307692314e-05, 'epoch': 0.38}


 13%|█▎        | 15270/121875 [4:41:59<32:31:30,  1.10s/it]

{'loss': 0.8116, 'learning_rate': 4.373538461538462e-05, 'epoch': 0.38}


 13%|█▎        | 15300/121875 [4:42:32<32:37:11,  1.10s/it]

{'loss': 0.7438, 'learning_rate': 4.3723076923076925e-05, 'epoch': 0.38}


 13%|█▎        | 15330/121875 [4:43:05<32:38:45,  1.10s/it]

{'loss': 0.809, 'learning_rate': 4.371076923076923e-05, 'epoch': 0.38}


 13%|█▎        | 15360/121875 [4:43:38<32:26:47,  1.10s/it]

{'loss': 0.8064, 'learning_rate': 4.3698461538461537e-05, 'epoch': 0.38}


 13%|█▎        | 15390/121875 [4:44:11<32:34:01,  1.10s/it]

{'loss': 0.8402, 'learning_rate': 4.368615384615385e-05, 'epoch': 0.38}


 13%|█▎        | 15420/121875 [4:44:44<32:19:37,  1.09s/it]

{'loss': 0.752, 'learning_rate': 4.3673846153846155e-05, 'epoch': 0.38}


 13%|█▎        | 15450/121875 [4:45:17<32:29:56,  1.10s/it]

{'loss': 0.8448, 'learning_rate': 4.366153846153847e-05, 'epoch': 0.38}


 13%|█▎        | 15480/121875 [4:45:50<32:33:45,  1.10s/it]

{'loss': 0.8047, 'learning_rate': 4.364923076923077e-05, 'epoch': 0.38}


 13%|█▎        | 15510/121875 [4:46:25<33:18:33,  1.13s/it]

{'loss': 0.8236, 'learning_rate': 4.363692307692308e-05, 'epoch': 0.38}


 13%|█▎        | 15540/121875 [4:46:59<32:38:03,  1.10s/it]

{'loss': 0.8191, 'learning_rate': 4.3624615384615384e-05, 'epoch': 0.38}


 13%|█▎        | 15570/121875 [4:47:32<32:32:15,  1.10s/it]

{'loss': 0.7896, 'learning_rate': 4.36123076923077e-05, 'epoch': 0.38}


 13%|█▎        | 15600/121875 [4:48:05<32:28:46,  1.10s/it]

{'loss': 0.7962, 'learning_rate': 4.36e-05, 'epoch': 0.38}


 13%|█▎        | 15630/121875 [4:48:38<32:24:18,  1.10s/it]

{'loss': 0.8914, 'learning_rate': 4.358769230769231e-05, 'epoch': 0.38}


 13%|█▎        | 15660/121875 [4:49:11<32:22:06,  1.10s/it]

{'loss': 0.8572, 'learning_rate': 4.357538461538462e-05, 'epoch': 0.39}


 13%|█▎        | 15690/121875 [4:49:44<32:36:56,  1.11s/it]

{'loss': 0.7982, 'learning_rate': 4.356307692307692e-05, 'epoch': 0.39}


 13%|█▎        | 15720/121875 [4:50:17<32:34:44,  1.10s/it]

{'loss': 0.8721, 'learning_rate': 4.355076923076923e-05, 'epoch': 0.39}


 13%|█▎        | 15750/121875 [4:50:50<32:28:27,  1.10s/it]

{'loss': 0.8057, 'learning_rate': 4.353846153846154e-05, 'epoch': 0.39}


 13%|█▎        | 15780/121875 [4:51:23<32:30:06,  1.10s/it]

{'loss': 0.8476, 'learning_rate': 4.352615384615385e-05, 'epoch': 0.39}


 13%|█▎        | 15810/121875 [4:51:56<32:30:07,  1.10s/it]

{'loss': 0.7861, 'learning_rate': 4.3513846153846156e-05, 'epoch': 0.39}


 13%|█▎        | 15840/121875 [4:52:29<32:28:59,  1.10s/it]

{'loss': 0.8555, 'learning_rate': 4.350153846153847e-05, 'epoch': 0.39}


 13%|█▎        | 15870/121875 [4:53:02<32:20:40,  1.10s/it]

{'loss': 0.8312, 'learning_rate': 4.348923076923077e-05, 'epoch': 0.39}


 13%|█▎        | 15900/121875 [4:53:35<32:20:38,  1.10s/it]

{'loss': 0.7426, 'learning_rate': 4.347692307692308e-05, 'epoch': 0.39}


 13%|█▎        | 15930/121875 [4:54:08<32:15:25,  1.10s/it]

{'loss': 0.854, 'learning_rate': 4.3464615384615386e-05, 'epoch': 0.39}


 13%|█▎        | 15960/121875 [4:54:41<32:19:01,  1.10s/it]

{'loss': 0.8279, 'learning_rate': 4.345230769230769e-05, 'epoch': 0.39}


 13%|█▎        | 15990/121875 [4:55:14<32:29:45,  1.10s/it]

{'loss': 0.8534, 'learning_rate': 4.3440000000000004e-05, 'epoch': 0.39}


 13%|█▎        | 16020/121875 [4:55:49<32:25:40,  1.10s/it]

{'loss': 0.8656, 'learning_rate': 4.342769230769231e-05, 'epoch': 0.39}


 13%|█▎        | 16050/121875 [4:56:22<32:12:24,  1.10s/it]

{'loss': 0.7641, 'learning_rate': 4.3415384615384616e-05, 'epoch': 0.4}


 13%|█▎        | 16080/121875 [4:56:55<32:18:29,  1.10s/it]

{'loss': 0.7989, 'learning_rate': 4.340307692307692e-05, 'epoch': 0.4}


 13%|█▎        | 16110/121875 [4:57:28<32:23:03,  1.10s/it]

{'loss': 0.8762, 'learning_rate': 4.3390769230769234e-05, 'epoch': 0.4}


 13%|█▎        | 16140/121875 [4:58:01<32:07:35,  1.09s/it]

{'loss': 0.7714, 'learning_rate': 4.337846153846154e-05, 'epoch': 0.4}


 13%|█▎        | 16170/121875 [4:58:34<32:23:35,  1.10s/it]

{'loss': 0.8288, 'learning_rate': 4.336615384615385e-05, 'epoch': 0.4}


 13%|█▎        | 16200/121875 [4:59:07<32:14:05,  1.10s/it]

{'loss': 0.8166, 'learning_rate': 4.335384615384616e-05, 'epoch': 0.4}


 13%|█▎        | 16230/121875 [4:59:40<32:25:15,  1.10s/it]

{'loss': 0.7748, 'learning_rate': 4.3341538461538464e-05, 'epoch': 0.4}


 13%|█▎        | 16260/121875 [5:00:13<32:21:55,  1.10s/it]

{'loss': 0.815, 'learning_rate': 4.332923076923077e-05, 'epoch': 0.4}


 13%|█▎        | 16290/121875 [5:00:46<32:11:54,  1.10s/it]

{'loss': 0.8441, 'learning_rate': 4.3316923076923075e-05, 'epoch': 0.4}


 13%|█▎        | 16320/121875 [5:01:19<32:25:54,  1.11s/it]

{'loss': 0.8266, 'learning_rate': 4.330461538461539e-05, 'epoch': 0.4}


 13%|█▎        | 16350/121875 [5:01:52<31:57:16,  1.09s/it]

{'loss': 0.8076, 'learning_rate': 4.329230769230769e-05, 'epoch': 0.4}


 13%|█▎        | 16380/121875 [5:02:25<32:13:32,  1.10s/it]

{'loss': 0.807, 'learning_rate': 4.3280000000000006e-05, 'epoch': 0.4}


 13%|█▎        | 16410/121875 [5:02:58<32:13:04,  1.10s/it]

{'loss': 0.7559, 'learning_rate': 4.326769230769231e-05, 'epoch': 0.4}


 13%|█▎        | 16440/121875 [5:03:31<32:03:08,  1.09s/it]

{'loss': 0.8327, 'learning_rate': 4.325538461538462e-05, 'epoch': 0.4}


 14%|█▎        | 16470/121875 [5:04:04<32:10:21,  1.10s/it]

{'loss': 0.7992, 'learning_rate': 4.324307692307692e-05, 'epoch': 0.41}


 14%|█▎        | 16500/121875 [5:04:37<32:14:07,  1.10s/it]

{'loss': 0.8172, 'learning_rate': 4.323076923076923e-05, 'epoch': 0.41}


 14%|█▎        | 16530/121875 [5:05:12<32:14:23,  1.10s/it]

{'loss': 0.7548, 'learning_rate': 4.321846153846154e-05, 'epoch': 0.41}


 14%|█▎        | 16560/121875 [5:05:45<32:09:09,  1.10s/it]

{'loss': 0.7571, 'learning_rate': 4.320615384615385e-05, 'epoch': 0.41}


 14%|█▎        | 16590/121875 [5:06:18<32:15:09,  1.10s/it]

{'loss': 0.9091, 'learning_rate': 4.319384615384616e-05, 'epoch': 0.41}


 14%|█▎        | 16620/121875 [5:06:51<32:15:18,  1.10s/it]

{'loss': 0.8475, 'learning_rate': 4.3181538461538465e-05, 'epoch': 0.41}


 14%|█▎        | 16650/121875 [5:07:24<31:59:44,  1.09s/it]

{'loss': 0.7632, 'learning_rate': 4.316923076923077e-05, 'epoch': 0.41}


 14%|█▎        | 16680/121875 [5:07:57<32:15:07,  1.10s/it]

{'loss': 0.8325, 'learning_rate': 4.3156923076923077e-05, 'epoch': 0.41}


 14%|█▎        | 16710/121875 [5:08:30<31:57:40,  1.09s/it]

{'loss': 0.7663, 'learning_rate': 4.314461538461539e-05, 'epoch': 0.41}


 14%|█▎        | 16740/121875 [5:09:03<32:15:12,  1.10s/it]

{'loss': 0.7767, 'learning_rate': 4.3132307692307695e-05, 'epoch': 0.41}


 14%|█▍        | 16770/121875 [5:09:36<32:08:02,  1.10s/it]

{'loss': 0.8616, 'learning_rate': 4.312000000000001e-05, 'epoch': 0.41}


 14%|█▍        | 16800/121875 [5:10:09<32:07:26,  1.10s/it]

{'loss': 0.8447, 'learning_rate': 4.310769230769231e-05, 'epoch': 0.41}


 14%|█▍        | 16830/121875 [5:10:42<32:14:44,  1.11s/it]

{'loss': 0.8351, 'learning_rate': 4.309538461538461e-05, 'epoch': 0.41}


 14%|█▍        | 16860/121875 [5:11:15<31:59:00,  1.10s/it]

{'loss': 0.8519, 'learning_rate': 4.3083076923076924e-05, 'epoch': 0.42}


 14%|█▍        | 16890/121875 [5:11:48<32:03:38,  1.10s/it]

{'loss': 0.8076, 'learning_rate': 4.307076923076923e-05, 'epoch': 0.42}


 14%|█▍        | 16920/121875 [5:12:21<32:01:54,  1.10s/it]

{'loss': 0.8364, 'learning_rate': 4.305846153846154e-05, 'epoch': 0.42}


 14%|█▍        | 16950/121875 [5:12:54<32:12:27,  1.11s/it]

{'loss': 0.8224, 'learning_rate': 4.304615384615385e-05, 'epoch': 0.42}


 14%|█▍        | 16980/121875 [5:13:27<31:54:00,  1.09s/it]

{'loss': 0.7841, 'learning_rate': 4.303384615384616e-05, 'epoch': 0.42}


 14%|█▍        | 17010/121875 [5:14:03<32:56:06,  1.13s/it]

{'loss': 0.7406, 'learning_rate': 4.302153846153846e-05, 'epoch': 0.42}


 14%|█▍        | 17040/121875 [5:14:36<32:07:00,  1.10s/it]

{'loss': 0.8486, 'learning_rate': 4.300923076923077e-05, 'epoch': 0.42}


 14%|█▍        | 17070/121875 [5:15:09<32:01:06,  1.10s/it]

{'loss': 0.8161, 'learning_rate': 4.299692307692308e-05, 'epoch': 0.42}


 14%|█▍        | 17100/121875 [5:15:42<31:57:18,  1.10s/it]

{'loss': 0.7614, 'learning_rate': 4.2984615384615384e-05, 'epoch': 0.42}


 14%|█▍        | 17130/121875 [5:16:15<32:01:32,  1.10s/it]

{'loss': 0.7736, 'learning_rate': 4.2972307692307696e-05, 'epoch': 0.42}


 14%|█▍        | 17160/121875 [5:16:48<31:57:53,  1.10s/it]

{'loss': 0.7882, 'learning_rate': 4.296e-05, 'epoch': 0.42}


 14%|█▍        | 17190/121875 [5:17:21<31:58:51,  1.10s/it]

{'loss': 0.7986, 'learning_rate': 4.294769230769231e-05, 'epoch': 0.42}


 14%|█▍        | 17220/121875 [5:17:54<32:01:37,  1.10s/it]

{'loss': 0.8834, 'learning_rate': 4.2935384615384613e-05, 'epoch': 0.42}


 14%|█▍        | 17250/121875 [5:18:27<31:53:39,  1.10s/it]

{'loss': 0.8231, 'learning_rate': 4.2923076923076926e-05, 'epoch': 0.42}


 14%|█▍        | 17280/121875 [5:19:00<31:51:26,  1.10s/it]

{'loss': 0.8148, 'learning_rate': 4.291076923076923e-05, 'epoch': 0.43}


 14%|█▍        | 17310/121875 [5:19:33<31:56:48,  1.10s/it]

{'loss': 0.8119, 'learning_rate': 4.2898461538461544e-05, 'epoch': 0.43}


 14%|█▍        | 17340/121875 [5:20:06<32:01:23,  1.10s/it]

{'loss': 0.7652, 'learning_rate': 4.288615384615385e-05, 'epoch': 0.43}


 14%|█▍        | 17370/121875 [5:20:39<32:01:41,  1.10s/it]

{'loss': 0.8031, 'learning_rate': 4.2873846153846156e-05, 'epoch': 0.43}


 14%|█▍        | 17400/121875 [5:21:12<31:50:38,  1.10s/it]

{'loss': 0.7973, 'learning_rate': 4.286153846153846e-05, 'epoch': 0.43}


 14%|█▍        | 17430/121875 [5:21:45<31:58:28,  1.10s/it]

{'loss': 0.7734, 'learning_rate': 4.284923076923077e-05, 'epoch': 0.43}


 14%|█▍        | 17460/121875 [5:22:18<31:57:38,  1.10s/it]

{'loss': 0.8214, 'learning_rate': 4.283692307692308e-05, 'epoch': 0.43}


 14%|█▍        | 17490/121875 [5:22:51<31:48:22,  1.10s/it]

{'loss': 0.8525, 'learning_rate': 4.2824615384615385e-05, 'epoch': 0.43}


 14%|█▍        | 17520/121875 [5:23:26<31:57:22,  1.10s/it]

{'loss': 0.8085, 'learning_rate': 4.28123076923077e-05, 'epoch': 0.43}


 14%|█▍        | 17550/121875 [5:23:59<31:42:38,  1.09s/it]

{'loss': 0.8454, 'learning_rate': 4.2800000000000004e-05, 'epoch': 0.43}


 14%|█▍        | 17580/121875 [5:24:32<31:56:24,  1.10s/it]

{'loss': 0.7718, 'learning_rate': 4.278769230769231e-05, 'epoch': 0.43}


 14%|█▍        | 17610/121875 [5:25:05<31:46:48,  1.10s/it]

{'loss': 0.8304, 'learning_rate': 4.2775384615384615e-05, 'epoch': 0.43}


 14%|█▍        | 17640/121875 [5:25:38<31:43:58,  1.10s/it]

{'loss': 0.815, 'learning_rate': 4.276307692307692e-05, 'epoch': 0.43}


 14%|█▍        | 17670/121875 [5:26:11<31:51:27,  1.10s/it]

{'loss': 0.7658, 'learning_rate': 4.275076923076923e-05, 'epoch': 0.43}


 15%|█▍        | 17700/121875 [5:26:44<31:47:26,  1.10s/it]

{'loss': 0.7458, 'learning_rate': 4.273846153846154e-05, 'epoch': 0.44}


 15%|█▍        | 17730/121875 [5:27:17<31:39:58,  1.09s/it]

{'loss': 0.8272, 'learning_rate': 4.272615384615385e-05, 'epoch': 0.44}


 15%|█▍        | 17760/121875 [5:27:50<31:49:55,  1.10s/it]

{'loss': 0.8343, 'learning_rate': 4.271384615384616e-05, 'epoch': 0.44}


 15%|█▍        | 17790/121875 [5:28:23<31:44:40,  1.10s/it]

{'loss': 0.7657, 'learning_rate': 4.270153846153846e-05, 'epoch': 0.44}


 15%|█▍        | 17820/121875 [5:28:56<31:42:21,  1.10s/it]

{'loss': 0.8122, 'learning_rate': 4.268923076923077e-05, 'epoch': 0.44}


 15%|█▍        | 17850/121875 [5:29:29<31:46:59,  1.10s/it]

{'loss': 0.8235, 'learning_rate': 4.267692307692308e-05, 'epoch': 0.44}


 15%|█▍        | 17880/121875 [5:30:02<31:47:39,  1.10s/it]

{'loss': 0.8392, 'learning_rate': 4.266461538461539e-05, 'epoch': 0.44}


 15%|█▍        | 17910/121875 [5:30:35<31:46:22,  1.10s/it]

{'loss': 0.7619, 'learning_rate': 4.26523076923077e-05, 'epoch': 0.44}


 15%|█▍        | 17940/121875 [5:31:08<31:48:16,  1.10s/it]

{'loss': 0.8169, 'learning_rate': 4.2640000000000005e-05, 'epoch': 0.44}


 15%|█▍        | 17970/121875 [5:31:41<31:48:08,  1.10s/it]

{'loss': 0.7689, 'learning_rate': 4.2627692307692304e-05, 'epoch': 0.44}


 15%|█▍        | 18000/121875 [5:32:14<31:49:56,  1.10s/it]

{'loss': 0.7773, 'learning_rate': 4.2615384615384617e-05, 'epoch': 0.44}


 15%|█▍        | 18030/121875 [5:32:49<31:41:56,  1.10s/it]

{'loss': 0.8297, 'learning_rate': 4.260307692307692e-05, 'epoch': 0.44}


 15%|█▍        | 18060/121875 [5:33:22<31:44:53,  1.10s/it]

{'loss': 0.8035, 'learning_rate': 4.2590769230769235e-05, 'epoch': 0.44}


 15%|█▍        | 18090/121875 [5:33:55<31:45:59,  1.10s/it]

{'loss': 0.8199, 'learning_rate': 4.257846153846154e-05, 'epoch': 0.45}


 15%|█▍        | 18120/121875 [5:34:28<31:38:30,  1.10s/it]

{'loss': 0.8059, 'learning_rate': 4.256615384615385e-05, 'epoch': 0.45}


 15%|█▍        | 18150/121875 [5:35:01<31:46:03,  1.10s/it]

{'loss': 0.7923, 'learning_rate': 4.255384615384615e-05, 'epoch': 0.45}


 15%|█▍        | 18180/121875 [5:35:34<31:35:10,  1.10s/it]

{'loss': 0.7819, 'learning_rate': 4.2541538461538464e-05, 'epoch': 0.45}


 15%|█▍        | 18210/121875 [5:36:07<31:39:11,  1.10s/it]

{'loss': 0.7806, 'learning_rate': 4.252923076923077e-05, 'epoch': 0.45}


 15%|█▍        | 18240/121875 [5:36:40<31:44:27,  1.10s/it]

{'loss': 0.7646, 'learning_rate': 4.2516923076923076e-05, 'epoch': 0.45}


 15%|█▍        | 18270/121875 [5:37:13<31:34:57,  1.10s/it]

{'loss': 0.785, 'learning_rate': 4.250461538461539e-05, 'epoch': 0.45}


 15%|█▌        | 18300/121875 [5:37:46<31:40:45,  1.10s/it]

{'loss': 0.7723, 'learning_rate': 4.2492307692307694e-05, 'epoch': 0.45}


 15%|█▌        | 18330/121875 [5:38:19<31:39:54,  1.10s/it]

{'loss': 0.8526, 'learning_rate': 4.248e-05, 'epoch': 0.45}


 15%|█▌        | 18360/121875 [5:38:52<31:41:17,  1.10s/it]

{'loss': 0.8303, 'learning_rate': 4.2467692307692306e-05, 'epoch': 0.45}


 15%|█▌        | 18390/121875 [5:39:25<31:29:38,  1.10s/it]

{'loss': 0.8193, 'learning_rate': 4.245538461538462e-05, 'epoch': 0.45}


 15%|█▌        | 18420/121875 [5:39:58<31:26:54,  1.09s/it]

{'loss': 0.7947, 'learning_rate': 4.2443076923076924e-05, 'epoch': 0.45}


 15%|█▌        | 18450/121875 [5:40:31<31:33:35,  1.10s/it]

{'loss': 0.8014, 'learning_rate': 4.2430769230769236e-05, 'epoch': 0.45}


 15%|█▌        | 18480/121875 [5:41:04<31:32:28,  1.10s/it]

{'loss': 0.7618, 'learning_rate': 4.241846153846154e-05, 'epoch': 0.45}


 15%|█▌        | 18510/121875 [5:41:39<32:21:30,  1.13s/it]

{'loss': 0.8088, 'learning_rate': 4.240615384615385e-05, 'epoch': 0.46}


 15%|█▌        | 18540/121875 [5:42:12<31:44:16,  1.11s/it]

{'loss': 0.7741, 'learning_rate': 4.2393846153846153e-05, 'epoch': 0.46}


 15%|█▌        | 18570/121875 [5:42:45<31:34:54,  1.10s/it]

{'loss': 0.8469, 'learning_rate': 4.238153846153846e-05, 'epoch': 0.46}


 15%|█▌        | 18600/121875 [5:43:18<31:36:37,  1.10s/it]

{'loss': 0.7851, 'learning_rate': 4.236923076923077e-05, 'epoch': 0.46}


 15%|█▌        | 18630/121875 [5:43:51<31:41:06,  1.10s/it]

{'loss': 0.7539, 'learning_rate': 4.235692307692308e-05, 'epoch': 0.46}


 15%|█▌        | 18660/121875 [5:44:24<31:35:13,  1.10s/it]

{'loss': 0.84, 'learning_rate': 4.234461538461539e-05, 'epoch': 0.46}


 15%|█▌        | 18690/121875 [5:44:57<31:29:52,  1.10s/it]

{'loss': 0.7721, 'learning_rate': 4.2332307692307696e-05, 'epoch': 0.46}


 15%|█▌        | 18720/121875 [5:45:30<31:27:17,  1.10s/it]

{'loss': 0.7474, 'learning_rate': 4.232e-05, 'epoch': 0.46}


 15%|█▌        | 18750/121875 [5:46:03<31:40:38,  1.11s/it]

{'loss': 0.7684, 'learning_rate': 4.230769230769231e-05, 'epoch': 0.46}


 15%|█▌        | 18780/121875 [5:46:36<31:31:01,  1.10s/it]

{'loss': 0.7544, 'learning_rate': 4.229538461538462e-05, 'epoch': 0.46}


 15%|█▌        | 18810/121875 [5:47:09<31:34:30,  1.10s/it]

{'loss': 0.822, 'learning_rate': 4.2283076923076925e-05, 'epoch': 0.46}


 15%|█▌        | 18840/121875 [5:47:42<31:27:54,  1.10s/it]

{'loss': 0.8126, 'learning_rate': 4.227076923076923e-05, 'epoch': 0.46}


 15%|█▌        | 18870/121875 [5:48:15<31:30:21,  1.10s/it]

{'loss': 0.8306, 'learning_rate': 4.2258461538461544e-05, 'epoch': 0.46}


 16%|█▌        | 18900/121875 [5:48:48<31:29:02,  1.10s/it]

{'loss': 0.8232, 'learning_rate': 4.224615384615385e-05, 'epoch': 0.47}


 16%|█▌        | 18930/121875 [5:49:21<31:31:04,  1.10s/it]

{'loss': 0.8233, 'learning_rate': 4.2233846153846155e-05, 'epoch': 0.47}


 16%|█▌        | 18960/121875 [5:49:54<31:29:08,  1.10s/it]

{'loss': 0.805, 'learning_rate': 4.222153846153846e-05, 'epoch': 0.47}


 16%|█▌        | 18990/121875 [5:50:27<31:25:33,  1.10s/it]

{'loss': 0.7147, 'learning_rate': 4.220923076923077e-05, 'epoch': 0.47}


 16%|█▌        | 19020/121875 [5:51:03<31:31:44,  1.10s/it]

{'loss': 0.7879, 'learning_rate': 4.219692307692308e-05, 'epoch': 0.47}


 16%|█▌        | 19050/121875 [5:51:36<31:24:51,  1.10s/it]

{'loss': 0.8457, 'learning_rate': 4.218461538461539e-05, 'epoch': 0.47}


 16%|█▌        | 19080/121875 [5:52:09<31:19:11,  1.10s/it]

{'loss': 0.8714, 'learning_rate': 4.21723076923077e-05, 'epoch': 0.47}


 16%|█▌        | 19110/121875 [5:52:42<31:23:14,  1.10s/it]

{'loss': 0.7808, 'learning_rate': 4.2159999999999996e-05, 'epoch': 0.47}


 16%|█▌        | 19140/121875 [5:53:14<31:18:18,  1.10s/it]

{'loss': 0.7785, 'learning_rate': 4.214769230769231e-05, 'epoch': 0.47}


 16%|█▌        | 19170/121875 [5:53:47<31:15:35,  1.10s/it]

{'loss': 0.7437, 'learning_rate': 4.2135384615384614e-05, 'epoch': 0.47}


 16%|█▌        | 19200/121875 [5:54:20<31:26:43,  1.10s/it]

{'loss': 0.8121, 'learning_rate': 4.212307692307693e-05, 'epoch': 0.47}


 16%|█▌        | 19230/121875 [5:54:53<31:30:28,  1.11s/it]

{'loss': 0.8577, 'learning_rate': 4.211076923076923e-05, 'epoch': 0.47}


 16%|█▌        | 19260/121875 [5:55:27<31:20:02,  1.10s/it]

{'loss': 0.8298, 'learning_rate': 4.2098461538461545e-05, 'epoch': 0.47}


 16%|█▌        | 19290/121875 [5:55:59<31:17:07,  1.10s/it]

{'loss': 0.785, 'learning_rate': 4.2086153846153844e-05, 'epoch': 0.47}


 16%|█▌        | 19320/121875 [5:56:33<31:25:20,  1.10s/it]

{'loss': 0.8134, 'learning_rate': 4.2073846153846157e-05, 'epoch': 0.48}


 16%|█▌        | 19350/121875 [5:57:06<31:14:54,  1.10s/it]

{'loss': 0.8027, 'learning_rate': 4.206153846153846e-05, 'epoch': 0.48}


 16%|█▌        | 19380/121875 [5:57:39<31:24:53,  1.10s/it]

{'loss': 0.7657, 'learning_rate': 4.204923076923077e-05, 'epoch': 0.48}


 16%|█▌        | 19410/121875 [5:58:12<31:16:39,  1.10s/it]

{'loss': 0.8335, 'learning_rate': 4.203692307692308e-05, 'epoch': 0.48}


 16%|█▌        | 19440/121875 [5:58:44<31:15:22,  1.10s/it]

{'loss': 0.8565, 'learning_rate': 4.2024615384615386e-05, 'epoch': 0.48}


 16%|█▌        | 19470/121875 [5:59:17<31:14:09,  1.10s/it]

{'loss': 0.7901, 'learning_rate': 4.201230769230769e-05, 'epoch': 0.48}


 16%|█▌        | 19500/121875 [5:59:50<31:13:01,  1.10s/it]

{'loss': 0.7678, 'learning_rate': 4.2e-05, 'epoch': 0.48}


 16%|█▌        | 19530/121875 [6:00:26<31:25:04,  1.11s/it]

{'loss': 0.721, 'learning_rate': 4.198769230769231e-05, 'epoch': 0.48}


 16%|█▌        | 19560/121875 [6:00:59<31:15:18,  1.10s/it]

{'loss': 0.8287, 'learning_rate': 4.1975384615384616e-05, 'epoch': 0.48}


 16%|█▌        | 19590/121875 [6:01:32<31:18:33,  1.10s/it]

{'loss': 0.7301, 'learning_rate': 4.196307692307693e-05, 'epoch': 0.48}


 16%|█▌        | 19620/121875 [6:02:05<31:20:01,  1.10s/it]

{'loss': 0.8545, 'learning_rate': 4.1950769230769234e-05, 'epoch': 0.48}


 16%|█▌        | 19650/121875 [6:02:38<31:21:01,  1.10s/it]

{'loss': 0.7985, 'learning_rate': 4.193846153846154e-05, 'epoch': 0.48}


 16%|█▌        | 19680/121875 [6:03:11<31:05:02,  1.09s/it]

{'loss': 0.7003, 'learning_rate': 4.1926153846153846e-05, 'epoch': 0.48}


 16%|█▌        | 19710/121875 [6:03:44<31:11:55,  1.10s/it]

{'loss': 0.7626, 'learning_rate': 4.191384615384615e-05, 'epoch': 0.49}


 16%|█▌        | 19740/121875 [6:04:17<31:10:43,  1.10s/it]

{'loss': 0.8151, 'learning_rate': 4.1901538461538464e-05, 'epoch': 0.49}


 16%|█▌        | 19770/121875 [6:04:50<31:16:59,  1.10s/it]

{'loss': 0.7996, 'learning_rate': 4.188923076923077e-05, 'epoch': 0.49}


 16%|█▌        | 19800/121875 [6:05:23<31:08:18,  1.10s/it]

{'loss': 0.8654, 'learning_rate': 4.187692307692308e-05, 'epoch': 0.49}


 16%|█▋        | 19830/121875 [6:05:56<31:13:30,  1.10s/it]

{'loss': 0.809, 'learning_rate': 4.186461538461539e-05, 'epoch': 0.49}


 16%|█▋        | 19860/121875 [6:06:29<31:15:16,  1.10s/it]

{'loss': 0.8601, 'learning_rate': 4.1852307692307693e-05, 'epoch': 0.49}


 16%|█▋        | 19890/121875 [6:07:02<31:15:14,  1.10s/it]

{'loss': 0.8166, 'learning_rate': 4.184e-05, 'epoch': 0.49}


 16%|█▋        | 19920/121875 [6:07:35<31:05:36,  1.10s/it]

{'loss': 0.7566, 'learning_rate': 4.182769230769231e-05, 'epoch': 0.49}


 16%|█▋        | 19950/121875 [6:08:08<31:08:35,  1.10s/it]

{'loss': 0.7838, 'learning_rate': 4.181538461538462e-05, 'epoch': 0.49}


 16%|█▋        | 19980/121875 [6:08:41<31:11:19,  1.10s/it]

{'loss': 0.8811, 'learning_rate': 4.180307692307692e-05, 'epoch': 0.49}


 16%|█▋        | 20010/121875 [6:09:16<31:59:23,  1.13s/it]

{'loss': 0.7028, 'learning_rate': 4.1790769230769236e-05, 'epoch': 0.49}


 16%|█▋        | 20040/121875 [6:09:49<31:05:01,  1.10s/it]

{'loss': 0.7849, 'learning_rate': 4.177846153846154e-05, 'epoch': 0.49}


 16%|█▋        | 20070/121875 [6:10:22<31:08:43,  1.10s/it]

{'loss': 0.7162, 'learning_rate': 4.176615384615385e-05, 'epoch': 0.49}


 16%|█▋        | 20100/121875 [6:10:55<31:05:21,  1.10s/it]

{'loss': 0.7573, 'learning_rate': 4.175384615384615e-05, 'epoch': 0.49}


 17%|█▋        | 20130/121875 [6:11:28<31:09:20,  1.10s/it]

{'loss': 0.7977, 'learning_rate': 4.1741538461538465e-05, 'epoch': 0.5}


 17%|█▋        | 20160/121875 [6:12:01<30:58:58,  1.10s/it]

{'loss': 0.787, 'learning_rate': 4.172923076923077e-05, 'epoch': 0.5}


 17%|█▋        | 20190/121875 [6:12:34<31:02:16,  1.10s/it]

{'loss': 0.7589, 'learning_rate': 4.1716923076923084e-05, 'epoch': 0.5}


 17%|█▋        | 20220/121875 [6:13:07<30:47:12,  1.09s/it]

{'loss': 0.792, 'learning_rate': 4.170461538461539e-05, 'epoch': 0.5}


 17%|█▋        | 20250/121875 [6:13:40<30:58:35,  1.10s/it]

{'loss': 0.7741, 'learning_rate': 4.169230769230769e-05, 'epoch': 0.5}


 17%|█▋        | 20280/121875 [6:14:13<31:13:34,  1.11s/it]

{'loss': 0.8208, 'learning_rate': 4.168e-05, 'epoch': 0.5}


 17%|█▋        | 20310/121875 [6:14:46<31:00:55,  1.10s/it]

{'loss': 0.8048, 'learning_rate': 4.1667692307692306e-05, 'epoch': 0.5}


 17%|█▋        | 20340/121875 [6:15:19<30:51:24,  1.09s/it]

{'loss': 0.8165, 'learning_rate': 4.165538461538462e-05, 'epoch': 0.5}


 17%|█▋        | 20370/121875 [6:15:52<30:57:37,  1.10s/it]

{'loss': 0.7819, 'learning_rate': 4.1643076923076925e-05, 'epoch': 0.5}


 17%|█▋        | 20400/121875 [6:16:25<30:58:39,  1.10s/it]

{'loss': 0.7804, 'learning_rate': 4.163076923076924e-05, 'epoch': 0.5}


 17%|█▋        | 20430/121875 [6:16:58<30:50:29,  1.09s/it]

{'loss': 0.789, 'learning_rate': 4.1618461538461536e-05, 'epoch': 0.5}


 17%|█▋        | 20460/121875 [6:17:31<31:06:54,  1.10s/it]

{'loss': 0.7885, 'learning_rate': 4.160615384615385e-05, 'epoch': 0.5}


 17%|█▋        | 20490/121875 [6:18:04<30:52:12,  1.10s/it]

{'loss': 0.8066, 'learning_rate': 4.1593846153846154e-05, 'epoch': 0.5}


 17%|█▋        | 20520/121875 [6:18:39<30:54:47,  1.10s/it]

{'loss': 0.8315, 'learning_rate': 4.158153846153847e-05, 'epoch': 0.51}


 17%|█▋        | 20550/121875 [6:19:12<31:01:07,  1.10s/it]

{'loss': 0.8354, 'learning_rate': 4.156923076923077e-05, 'epoch': 0.51}


 17%|█▋        | 20580/121875 [6:19:45<31:05:19,  1.10s/it]

{'loss': 0.7696, 'learning_rate': 4.155692307692308e-05, 'epoch': 0.51}


 17%|█▋        | 20610/121875 [6:20:18<30:50:52,  1.10s/it]

{'loss': 0.7995, 'learning_rate': 4.1544615384615384e-05, 'epoch': 0.51}


 17%|█▋        | 20640/121875 [6:20:51<31:01:52,  1.10s/it]

{'loss': 0.7666, 'learning_rate': 4.153230769230769e-05, 'epoch': 0.51}


 17%|█▋        | 20670/121875 [6:21:24<30:46:46,  1.09s/it]

{'loss': 0.8156, 'learning_rate': 4.152e-05, 'epoch': 0.51}


 17%|█▋        | 20700/121875 [6:21:57<30:45:31,  1.09s/it]

{'loss': 0.853, 'learning_rate': 4.150769230769231e-05, 'epoch': 0.51}


 17%|█▋        | 20730/121875 [6:22:30<31:08:52,  1.11s/it]

{'loss': 0.774, 'learning_rate': 4.149538461538462e-05, 'epoch': 0.51}


 17%|█▋        | 20760/121875 [6:23:03<30:49:44,  1.10s/it]

{'loss': 0.784, 'learning_rate': 4.1483076923076926e-05, 'epoch': 0.51}


 17%|█▋        | 20790/121875 [6:23:36<30:50:22,  1.10s/it]

{'loss': 0.7869, 'learning_rate': 4.147076923076923e-05, 'epoch': 0.51}


 17%|█▋        | 20820/121875 [6:24:09<30:55:32,  1.10s/it]

{'loss': 0.7857, 'learning_rate': 4.145846153846154e-05, 'epoch': 0.51}


 17%|█▋        | 20850/121875 [6:24:42<30:42:14,  1.09s/it]

{'loss': 0.7968, 'learning_rate': 4.1446153846153843e-05, 'epoch': 0.51}


 17%|█▋        | 20880/121875 [6:25:15<30:49:32,  1.10s/it]

{'loss': 0.7743, 'learning_rate': 4.1433846153846156e-05, 'epoch': 0.51}


 17%|█▋        | 20910/121875 [6:25:48<30:55:42,  1.10s/it]

{'loss': 0.7437, 'learning_rate': 4.142153846153846e-05, 'epoch': 0.51}


 17%|█▋        | 20940/121875 [6:26:21<30:52:14,  1.10s/it]

{'loss': 0.7776, 'learning_rate': 4.1409230769230774e-05, 'epoch': 0.52}


 17%|█▋        | 20970/121875 [6:26:54<30:49:54,  1.10s/it]

{'loss': 0.8085, 'learning_rate': 4.139692307692308e-05, 'epoch': 0.52}


 17%|█▋        | 21000/121875 [6:27:27<30:44:01,  1.10s/it]

{'loss': 0.7784, 'learning_rate': 4.1384615384615386e-05, 'epoch': 0.52}


 17%|█▋        | 21030/121875 [6:28:03<30:49:33,  1.10s/it]

{'loss': 0.801, 'learning_rate': 4.137230769230769e-05, 'epoch': 0.52}


 17%|█▋        | 21060/121875 [6:28:36<30:47:32,  1.10s/it]

{'loss': 0.7799, 'learning_rate': 4.1360000000000004e-05, 'epoch': 0.52}


 17%|█▋        | 21090/121875 [6:29:09<30:53:01,  1.10s/it]

{'loss': 0.7828, 'learning_rate': 4.134769230769231e-05, 'epoch': 0.52}


 17%|█▋        | 21120/121875 [6:29:42<30:51:26,  1.10s/it]

{'loss': 0.7475, 'learning_rate': 4.1335384615384615e-05, 'epoch': 0.52}


 17%|█▋        | 21150/121875 [6:30:15<30:43:54,  1.10s/it]

{'loss': 0.7693, 'learning_rate': 4.132307692307693e-05, 'epoch': 0.52}


 17%|█▋        | 21180/121875 [6:30:48<30:44:20,  1.10s/it]

{'loss': 0.8429, 'learning_rate': 4.1310769230769233e-05, 'epoch': 0.52}


 17%|█▋        | 21210/121875 [6:31:21<30:47:07,  1.10s/it]

{'loss': 0.7887, 'learning_rate': 4.129846153846154e-05, 'epoch': 0.52}


 17%|█▋        | 21240/121875 [6:31:54<30:48:55,  1.10s/it]

{'loss': 0.8145, 'learning_rate': 4.1286153846153845e-05, 'epoch': 0.52}


 17%|█▋        | 21270/121875 [6:32:27<30:44:13,  1.10s/it]

{'loss': 0.8491, 'learning_rate': 4.127384615384616e-05, 'epoch': 0.52}


 17%|█▋        | 21300/121875 [6:33:00<30:42:25,  1.10s/it]

{'loss': 0.7723, 'learning_rate': 4.126153846153846e-05, 'epoch': 0.52}


 18%|█▊        | 21330/121875 [6:33:33<30:37:51,  1.10s/it]

{'loss': 0.7098, 'learning_rate': 4.1249230769230776e-05, 'epoch': 0.53}


 18%|█▊        | 21360/121875 [6:34:06<30:27:38,  1.09s/it]

{'loss': 0.7755, 'learning_rate': 4.123692307692308e-05, 'epoch': 0.53}


 18%|█▊        | 21390/121875 [6:34:39<30:38:53,  1.10s/it]

{'loss': 0.7834, 'learning_rate': 4.122461538461539e-05, 'epoch': 0.53}


 18%|█▊        | 21420/121875 [6:35:12<30:49:24,  1.10s/it]

{'loss': 0.7477, 'learning_rate': 4.121230769230769e-05, 'epoch': 0.53}


 18%|█▊        | 21450/121875 [6:35:45<30:31:36,  1.09s/it]

{'loss': 0.8277, 'learning_rate': 4.12e-05, 'epoch': 0.53}


 18%|█▊        | 21480/121875 [6:36:18<30:33:25,  1.10s/it]

{'loss': 0.7943, 'learning_rate': 4.118769230769231e-05, 'epoch': 0.53}


 18%|█▊        | 21510/121875 [6:36:53<31:20:30,  1.12s/it]

{'loss': 0.8018, 'learning_rate': 4.117538461538462e-05, 'epoch': 0.53}


 18%|█▊        | 21540/121875 [6:37:26<30:40:10,  1.10s/it]

{'loss': 0.7094, 'learning_rate': 4.116307692307693e-05, 'epoch': 0.53}


 18%|█▊        | 21570/121875 [6:37:59<30:36:04,  1.10s/it]

{'loss': 0.7694, 'learning_rate': 4.1150769230769235e-05, 'epoch': 0.53}


 18%|█▊        | 21600/121875 [6:38:32<30:33:36,  1.10s/it]

{'loss': 0.7722, 'learning_rate': 4.113846153846154e-05, 'epoch': 0.53}


 18%|█▊        | 21630/121875 [6:39:05<30:41:05,  1.10s/it]

{'loss': 0.6969, 'learning_rate': 4.1126153846153846e-05, 'epoch': 0.53}


 18%|█▊        | 21660/121875 [6:39:38<30:42:22,  1.10s/it]

{'loss': 0.7368, 'learning_rate': 4.111384615384616e-05, 'epoch': 0.53}


 18%|█▊        | 21690/121875 [6:40:11<30:40:20,  1.10s/it]

{'loss': 0.8255, 'learning_rate': 4.1101538461538465e-05, 'epoch': 0.53}


 18%|█▊        | 21720/121875 [6:40:44<30:37:28,  1.10s/it]

{'loss': 0.7728, 'learning_rate': 4.108923076923077e-05, 'epoch': 0.53}


 18%|█▊        | 21750/121875 [6:41:17<30:38:15,  1.10s/it]

{'loss': 0.7988, 'learning_rate': 4.1076923076923076e-05, 'epoch': 0.54}


 18%|█▊        | 21780/121875 [6:41:50<30:38:18,  1.10s/it]

{'loss': 0.7114, 'learning_rate': 4.106461538461538e-05, 'epoch': 0.54}


 18%|█▊        | 21810/121875 [6:42:23<30:33:22,  1.10s/it]

{'loss': 0.7425, 'learning_rate': 4.1052307692307694e-05, 'epoch': 0.54}


 18%|█▊        | 21840/121875 [6:42:56<30:34:18,  1.10s/it]

{'loss': 0.8523, 'learning_rate': 4.104e-05, 'epoch': 0.54}


 18%|█▊        | 21870/121875 [6:43:29<30:24:39,  1.09s/it]

{'loss': 0.8147, 'learning_rate': 4.102769230769231e-05, 'epoch': 0.54}


 18%|█▊        | 21900/121875 [6:44:02<30:36:15,  1.10s/it]

{'loss': 0.8221, 'learning_rate': 4.101538461538462e-05, 'epoch': 0.54}


 18%|█▊        | 21930/121875 [6:44:35<30:33:09,  1.10s/it]

{'loss': 0.7941, 'learning_rate': 4.1003076923076924e-05, 'epoch': 0.54}


 18%|█▊        | 21960/121875 [6:45:09<30:32:00,  1.10s/it]

{'loss': 0.8766, 'learning_rate': 4.099076923076923e-05, 'epoch': 0.54}


 18%|█▊        | 21990/121875 [6:45:42<30:28:18,  1.10s/it]

{'loss': 0.7567, 'learning_rate': 4.0978461538461535e-05, 'epoch': 0.54}


 18%|█▊        | 22020/121875 [6:46:17<30:28:46,  1.10s/it]

{'loss': 0.8325, 'learning_rate': 4.096615384615385e-05, 'epoch': 0.54}


 18%|█▊        | 22050/121875 [6:46:50<30:29:42,  1.10s/it]

{'loss': 0.7868, 'learning_rate': 4.0953846153846154e-05, 'epoch': 0.54}


 18%|█▊        | 22080/121875 [6:47:23<30:27:28,  1.10s/it]

{'loss': 0.7743, 'learning_rate': 4.0941538461538466e-05, 'epoch': 0.54}


 18%|█▊        | 22110/121875 [6:47:56<30:24:18,  1.10s/it]

{'loss': 0.803, 'learning_rate': 4.092923076923077e-05, 'epoch': 0.54}


 18%|█▊        | 22140/121875 [6:48:29<30:31:18,  1.10s/it]

{'loss': 0.8299, 'learning_rate': 4.091692307692308e-05, 'epoch': 0.54}


 18%|█▊        | 22170/121875 [6:49:02<30:29:05,  1.10s/it]

{'loss': 0.7914, 'learning_rate': 4.0904615384615383e-05, 'epoch': 0.55}


 18%|█▊        | 22200/121875 [6:49:35<30:28:52,  1.10s/it]

{'loss': 0.7893, 'learning_rate': 4.0892307692307696e-05, 'epoch': 0.55}


 18%|█▊        | 22230/121875 [6:50:08<30:26:00,  1.10s/it]

{'loss': 0.8811, 'learning_rate': 4.088e-05, 'epoch': 0.55}


 18%|█▊        | 22260/121875 [6:50:41<30:19:05,  1.10s/it]

{'loss': 0.7952, 'learning_rate': 4.0867692307692314e-05, 'epoch': 0.55}


 18%|█▊        | 22290/121875 [6:51:14<30:30:17,  1.10s/it]

{'loss': 0.7826, 'learning_rate': 4.085538461538462e-05, 'epoch': 0.55}


 18%|█▊        | 22320/121875 [6:51:47<30:22:52,  1.10s/it]

{'loss': 0.7623, 'learning_rate': 4.0843076923076926e-05, 'epoch': 0.55}


 18%|█▊        | 22350/121875 [6:52:20<30:24:22,  1.10s/it]

{'loss': 0.8087, 'learning_rate': 4.083076923076923e-05, 'epoch': 0.55}


 18%|█▊        | 22380/121875 [6:52:53<30:19:02,  1.10s/it]

{'loss': 0.7905, 'learning_rate': 4.081846153846154e-05, 'epoch': 0.55}


 18%|█▊        | 22410/121875 [6:53:26<30:24:11,  1.10s/it]

{'loss': 0.8566, 'learning_rate': 4.080615384615385e-05, 'epoch': 0.55}


 18%|█▊        | 22440/121875 [6:53:59<30:27:49,  1.10s/it]

{'loss': 0.7687, 'learning_rate': 4.0793846153846155e-05, 'epoch': 0.55}


 18%|█▊        | 22470/121875 [6:54:32<30:18:59,  1.10s/it]

{'loss': 0.7875, 'learning_rate': 4.078153846153847e-05, 'epoch': 0.55}


 18%|█▊        | 22500/121875 [6:55:05<30:30:30,  1.11s/it]

{'loss': 0.8039, 'learning_rate': 4.0769230769230773e-05, 'epoch': 0.55}


 18%|█▊        | 22530/121875 [6:55:40<30:24:18,  1.10s/it]

{'loss': 0.771, 'learning_rate': 4.075692307692308e-05, 'epoch': 0.55}


 19%|█▊        | 22560/121875 [6:56:13<30:18:20,  1.10s/it]

{'loss': 0.8121, 'learning_rate': 4.0744615384615385e-05, 'epoch': 0.56}


 19%|█▊        | 22590/121875 [6:56:46<30:34:22,  1.11s/it]

{'loss': 0.7864, 'learning_rate': 4.073230769230769e-05, 'epoch': 0.56}


 19%|█▊        | 22620/121875 [6:57:19<30:27:43,  1.10s/it]

{'loss': 0.8176, 'learning_rate': 4.072e-05, 'epoch': 0.56}


 19%|█▊        | 22650/121875 [6:57:52<30:08:34,  1.09s/it]

{'loss': 0.8106, 'learning_rate': 4.070769230769231e-05, 'epoch': 0.56}


 19%|█▊        | 22680/121875 [6:58:25<30:20:50,  1.10s/it]

{'loss': 0.8109, 'learning_rate': 4.069538461538462e-05, 'epoch': 0.56}


 19%|█▊        | 22710/121875 [6:58:58<30:18:50,  1.10s/it]

{'loss': 0.787, 'learning_rate': 4.068307692307693e-05, 'epoch': 0.56}


 19%|█▊        | 22740/121875 [6:59:31<30:19:36,  1.10s/it]

{'loss': 0.7583, 'learning_rate': 4.067076923076923e-05, 'epoch': 0.56}


 19%|█▊        | 22770/121875 [7:00:04<30:08:47,  1.10s/it]

{'loss': 0.8012, 'learning_rate': 4.065846153846154e-05, 'epoch': 0.56}


 19%|█▊        | 22800/121875 [7:00:37<30:07:53,  1.09s/it]

{'loss': 0.7376, 'learning_rate': 4.064615384615385e-05, 'epoch': 0.56}


 19%|█▊        | 22830/121875 [7:01:10<30:17:05,  1.10s/it]

{'loss': 0.8188, 'learning_rate': 4.063384615384616e-05, 'epoch': 0.56}


 19%|█▉        | 22860/121875 [7:01:43<30:06:18,  1.09s/it]

{'loss': 0.7232, 'learning_rate': 4.062153846153846e-05, 'epoch': 0.56}


 19%|█▉        | 22890/121875 [7:02:16<30:11:31,  1.10s/it]

{'loss': 0.7552, 'learning_rate': 4.0609230769230775e-05, 'epoch': 0.56}


 19%|█▉        | 22920/121875 [7:02:49<30:13:41,  1.10s/it]

{'loss': 0.8843, 'learning_rate': 4.0596923076923074e-05, 'epoch': 0.56}


 19%|█▉        | 22950/121875 [7:03:22<30:20:47,  1.10s/it]

{'loss': 0.7433, 'learning_rate': 4.0584615384615386e-05, 'epoch': 0.56}


 19%|█▉        | 22980/121875 [7:03:55<30:10:28,  1.10s/it]

{'loss': 0.7642, 'learning_rate': 4.057230769230769e-05, 'epoch': 0.57}


 19%|█▉        | 23010/121875 [7:04:31<31:00:16,  1.13s/it]

{'loss': 0.8343, 'learning_rate': 4.0560000000000005e-05, 'epoch': 0.57}


 19%|█▉        | 23040/121875 [7:05:04<30:19:59,  1.10s/it]

{'loss': 0.7915, 'learning_rate': 4.054769230769231e-05, 'epoch': 0.57}


 19%|█▉        | 23070/121875 [7:05:37<30:16:22,  1.10s/it]

{'loss': 0.8349, 'learning_rate': 4.0535384615384616e-05, 'epoch': 0.57}


 19%|█▉        | 23100/121875 [7:06:10<30:15:33,  1.10s/it]

{'loss': 0.8487, 'learning_rate': 4.052307692307692e-05, 'epoch': 0.57}


 19%|█▉        | 23130/121875 [7:06:43<30:11:00,  1.10s/it]

{'loss': 0.8442, 'learning_rate': 4.0510769230769234e-05, 'epoch': 0.57}


 19%|█▉        | 23160/121875 [7:07:16<30:09:03,  1.10s/it]

{'loss': 0.8087, 'learning_rate': 4.049846153846154e-05, 'epoch': 0.57}


 19%|█▉        | 23190/121875 [7:07:48<30:04:50,  1.10s/it]

{'loss': 0.7475, 'learning_rate': 4.0486153846153846e-05, 'epoch': 0.57}


 19%|█▉        | 23220/121875 [7:08:21<30:03:22,  1.10s/it]

{'loss': 0.7664, 'learning_rate': 4.047384615384616e-05, 'epoch': 0.57}


 19%|█▉        | 23250/121875 [7:08:55<30:12:36,  1.10s/it]

{'loss': 0.8039, 'learning_rate': 4.0461538461538464e-05, 'epoch': 0.57}


 19%|█▉        | 23280/121875 [7:09:27<30:06:53,  1.10s/it]

{'loss': 0.7608, 'learning_rate': 4.044923076923077e-05, 'epoch': 0.57}


 19%|█▉        | 23310/121875 [7:10:00<30:00:33,  1.10s/it]

{'loss': 0.7733, 'learning_rate': 4.0436923076923075e-05, 'epoch': 0.57}


 19%|█▉        | 23340/121875 [7:10:34<30:04:47,  1.10s/it]

{'loss': 0.7996, 'learning_rate': 4.042461538461539e-05, 'epoch': 0.57}


 19%|█▉        | 23370/121875 [7:11:07<30:09:10,  1.10s/it]

{'loss': 0.8183, 'learning_rate': 4.0412307692307694e-05, 'epoch': 0.58}


 19%|█▉        | 23400/121875 [7:11:40<30:15:17,  1.11s/it]

{'loss': 0.7842, 'learning_rate': 4.0400000000000006e-05, 'epoch': 0.58}


 19%|█▉        | 23430/121875 [7:12:13<30:07:45,  1.10s/it]

{'loss': 0.7871, 'learning_rate': 4.038769230769231e-05, 'epoch': 0.58}


 19%|█▉        | 23460/121875 [7:12:46<30:03:00,  1.10s/it]

{'loss': 0.8347, 'learning_rate': 4.037538461538462e-05, 'epoch': 0.58}


 19%|█▉        | 23490/121875 [7:13:19<29:46:14,  1.09s/it]

{'loss': 0.8275, 'learning_rate': 4.0363076923076923e-05, 'epoch': 0.58}


 19%|█▉        | 23520/121875 [7:13:54<29:51:47,  1.09s/it]

{'loss': 0.7569, 'learning_rate': 4.035076923076923e-05, 'epoch': 0.58}


 19%|█▉        | 23550/121875 [7:14:27<29:59:03,  1.10s/it]

{'loss': 0.787, 'learning_rate': 4.033846153846154e-05, 'epoch': 0.58}


 19%|█▉        | 23580/121875 [7:15:00<30:02:28,  1.10s/it]

{'loss': 0.816, 'learning_rate': 4.032615384615385e-05, 'epoch': 0.58}


 19%|█▉        | 23610/121875 [7:15:33<29:58:39,  1.10s/it]

{'loss': 0.8183, 'learning_rate': 4.031384615384616e-05, 'epoch': 0.58}


 19%|█▉        | 23640/121875 [7:16:06<30:00:18,  1.10s/it]

{'loss': 0.8646, 'learning_rate': 4.0301538461538466e-05, 'epoch': 0.58}


 19%|█▉        | 23670/121875 [7:16:39<29:53:28,  1.10s/it]

{'loss': 0.8388, 'learning_rate': 4.028923076923077e-05, 'epoch': 0.58}


 19%|█▉        | 23700/121875 [7:17:12<29:59:02,  1.10s/it]

{'loss': 0.8593, 'learning_rate': 4.027692307692308e-05, 'epoch': 0.58}


 19%|█▉        | 23730/121875 [7:17:45<29:59:41,  1.10s/it]

{'loss': 0.8558, 'learning_rate': 4.026461538461538e-05, 'epoch': 0.58}


 19%|█▉        | 23760/121875 [7:18:18<29:59:59,  1.10s/it]

{'loss': 0.8465, 'learning_rate': 4.0252307692307695e-05, 'epoch': 0.58}


 20%|█▉        | 23790/121875 [7:18:51<29:50:22,  1.10s/it]

{'loss': 0.702, 'learning_rate': 4.024e-05, 'epoch': 0.59}


 20%|█▉        | 23820/121875 [7:19:24<29:52:34,  1.10s/it]

{'loss': 0.84, 'learning_rate': 4.0227692307692313e-05, 'epoch': 0.59}


 20%|█▉        | 23850/121875 [7:19:57<29:53:32,  1.10s/it]

{'loss': 0.8078, 'learning_rate': 4.021538461538462e-05, 'epoch': 0.59}


 20%|█▉        | 23880/121875 [7:20:30<30:03:12,  1.10s/it]

{'loss': 0.8157, 'learning_rate': 4.0203076923076925e-05, 'epoch': 0.59}


 20%|█▉        | 23910/121875 [7:21:03<29:54:57,  1.10s/it]

{'loss': 0.7562, 'learning_rate': 4.019076923076923e-05, 'epoch': 0.59}


 20%|█▉        | 23940/121875 [7:21:36<30:05:14,  1.11s/it]

{'loss': 0.7861, 'learning_rate': 4.017846153846154e-05, 'epoch': 0.59}


 20%|█▉        | 23970/121875 [7:22:09<29:47:21,  1.10s/it]

{'loss': 0.7807, 'learning_rate': 4.016615384615385e-05, 'epoch': 0.59}


 20%|█▉        | 24000/121875 [7:22:42<29:55:03,  1.10s/it]

{'loss': 0.7449, 'learning_rate': 4.0153846153846155e-05, 'epoch': 0.59}


 20%|█▉        | 24030/121875 [7:23:17<29:54:46,  1.10s/it]

{'loss': 0.7771, 'learning_rate': 4.014153846153847e-05, 'epoch': 0.59}


 20%|█▉        | 24060/121875 [7:23:50<30:00:51,  1.10s/it]

{'loss': 0.841, 'learning_rate': 4.0129230769230766e-05, 'epoch': 0.59}


 20%|█▉        | 24090/121875 [7:24:23<29:50:05,  1.10s/it]

{'loss': 0.7539, 'learning_rate': 4.011692307692308e-05, 'epoch': 0.59}


 20%|█▉        | 24120/121875 [7:24:56<30:05:56,  1.11s/it]

{'loss': 0.8274, 'learning_rate': 4.0104615384615384e-05, 'epoch': 0.59}


 20%|█▉        | 24150/121875 [7:25:30<30:03:48,  1.11s/it]

{'loss': 0.7827, 'learning_rate': 4.00923076923077e-05, 'epoch': 0.59}


 20%|█▉        | 24180/121875 [7:26:03<29:57:43,  1.10s/it]

{'loss': 0.8365, 'learning_rate': 4.008e-05, 'epoch': 0.6}


 20%|█▉        | 24210/121875 [7:26:36<29:51:10,  1.10s/it]

{'loss': 0.7344, 'learning_rate': 4.0067692307692315e-05, 'epoch': 0.6}


 20%|█▉        | 24240/121875 [7:27:09<29:44:47,  1.10s/it]

{'loss': 0.823, 'learning_rate': 4.0055384615384614e-05, 'epoch': 0.6}


 20%|█▉        | 24270/121875 [7:27:42<29:51:50,  1.10s/it]

{'loss': 0.7615, 'learning_rate': 4.0043076923076926e-05, 'epoch': 0.6}


 20%|█▉        | 24300/121875 [7:28:15<29:56:33,  1.10s/it]

{'loss': 0.8016, 'learning_rate': 4.003076923076923e-05, 'epoch': 0.6}


 20%|█▉        | 24330/121875 [7:28:48<29:50:26,  1.10s/it]

{'loss': 0.7377, 'learning_rate': 4.001846153846154e-05, 'epoch': 0.6}


 20%|█▉        | 24360/121875 [7:29:21<29:48:55,  1.10s/it]

{'loss': 0.85, 'learning_rate': 4.000615384615385e-05, 'epoch': 0.6}


 20%|██        | 24390/121875 [7:29:54<29:43:35,  1.10s/it]

{'loss': 0.7903, 'learning_rate': 3.9993846153846156e-05, 'epoch': 0.6}


 20%|██        | 24420/121875 [7:30:27<29:48:05,  1.10s/it]

{'loss': 0.7841, 'learning_rate': 3.998153846153846e-05, 'epoch': 0.6}


 20%|██        | 24450/121875 [7:31:01<30:17:26,  1.12s/it]

{'loss': 0.757, 'learning_rate': 3.996923076923077e-05, 'epoch': 0.6}


 20%|██        | 24480/121875 [7:31:34<29:50:40,  1.10s/it]

{'loss': 0.7989, 'learning_rate': 3.995692307692308e-05, 'epoch': 0.6}


 20%|██        | 24510/121875 [7:32:09<30:29:00,  1.13s/it]

{'loss': 0.7655, 'learning_rate': 3.9944615384615386e-05, 'epoch': 0.6}


 20%|██        | 24540/121875 [7:32:42<29:51:21,  1.10s/it]

{'loss': 0.7865, 'learning_rate': 3.99323076923077e-05, 'epoch': 0.6}


 20%|██        | 24570/121875 [7:33:15<29:47:44,  1.10s/it]

{'loss': 0.7209, 'learning_rate': 3.9920000000000004e-05, 'epoch': 0.6}


 20%|██        | 24600/121875 [7:33:48<29:14:26,  1.08s/it]

{'loss': 0.7622, 'learning_rate': 3.990769230769231e-05, 'epoch': 0.61}


 20%|██        | 24630/121875 [7:34:20<29:10:40,  1.08s/it]

{'loss': 0.7247, 'learning_rate': 3.9895384615384615e-05, 'epoch': 0.61}


 20%|██        | 24660/121875 [7:34:53<29:30:41,  1.09s/it]

{'loss': 0.7563, 'learning_rate': 3.988307692307692e-05, 'epoch': 0.61}


 20%|██        | 24690/121875 [7:35:26<29:43:21,  1.10s/it]

{'loss': 0.7723, 'learning_rate': 3.9870769230769234e-05, 'epoch': 0.61}


 20%|██        | 24720/121875 [7:35:59<29:42:20,  1.10s/it]

{'loss': 0.7478, 'learning_rate': 3.985846153846154e-05, 'epoch': 0.61}


 20%|██        | 24750/121875 [7:36:32<29:46:50,  1.10s/it]

{'loss': 0.8267, 'learning_rate': 3.984615384615385e-05, 'epoch': 0.61}


 20%|██        | 24780/121875 [7:37:05<29:44:06,  1.10s/it]

{'loss': 0.8277, 'learning_rate': 3.983384615384616e-05, 'epoch': 0.61}


 20%|██        | 24810/121875 [7:37:38<29:41:08,  1.10s/it]

{'loss': 0.7949, 'learning_rate': 3.9821538461538463e-05, 'epoch': 0.61}


 20%|██        | 24840/121875 [7:38:12<29:49:37,  1.11s/it]

{'loss': 0.7695, 'learning_rate': 3.980923076923077e-05, 'epoch': 0.61}


 20%|██        | 24870/121875 [7:38:45<29:51:53,  1.11s/it]

{'loss': 0.7899, 'learning_rate': 3.9796923076923075e-05, 'epoch': 0.61}


 20%|██        | 24900/121875 [7:39:18<29:44:43,  1.10s/it]

{'loss': 0.7877, 'learning_rate': 3.978461538461539e-05, 'epoch': 0.61}


 20%|██        | 24930/121875 [7:39:51<29:39:09,  1.10s/it]

{'loss': 0.8059, 'learning_rate': 3.977230769230769e-05, 'epoch': 0.61}


 20%|██        | 24960/121875 [7:40:24<29:44:35,  1.10s/it]

{'loss': 0.7615, 'learning_rate': 3.9760000000000006e-05, 'epoch': 0.61}


 21%|██        | 24990/121875 [7:40:57<29:44:36,  1.11s/it]

{'loss': 0.7709, 'learning_rate': 3.974769230769231e-05, 'epoch': 0.62}


 21%|██        | 25020/121875 [7:41:32<29:42:54,  1.10s/it]

{'loss': 0.8006, 'learning_rate': 3.973538461538462e-05, 'epoch': 0.62}


 21%|██        | 25050/121875 [7:42:05<29:38:31,  1.10s/it]

{'loss': 0.7632, 'learning_rate': 3.972307692307692e-05, 'epoch': 0.62}


 21%|██        | 25080/121875 [7:42:38<29:41:32,  1.10s/it]

{'loss': 0.7676, 'learning_rate': 3.9710769230769235e-05, 'epoch': 0.62}


 21%|██        | 25110/121875 [7:43:12<29:32:31,  1.10s/it]

{'loss': 0.8275, 'learning_rate': 3.969846153846154e-05, 'epoch': 0.62}


 21%|██        | 25140/121875 [7:43:45<29:30:09,  1.10s/it]

{'loss': 0.7575, 'learning_rate': 3.9686153846153853e-05, 'epoch': 0.62}


 21%|██        | 25170/121875 [7:44:18<29:37:13,  1.10s/it]

{'loss': 0.8107, 'learning_rate': 3.967384615384616e-05, 'epoch': 0.62}


 21%|██        | 25200/121875 [7:44:51<29:38:56,  1.10s/it]

{'loss': 0.8309, 'learning_rate': 3.966153846153846e-05, 'epoch': 0.62}


 21%|██        | 25230/121875 [7:45:24<29:31:42,  1.10s/it]

{'loss': 0.8344, 'learning_rate': 3.964923076923077e-05, 'epoch': 0.62}


 21%|██        | 25260/121875 [7:45:57<29:34:46,  1.10s/it]

{'loss': 0.7866, 'learning_rate': 3.9636923076923076e-05, 'epoch': 0.62}


 21%|██        | 25290/121875 [7:46:30<29:31:04,  1.10s/it]

{'loss': 0.7739, 'learning_rate': 3.962461538461539e-05, 'epoch': 0.62}


 21%|██        | 25320/121875 [7:47:03<29:37:08,  1.10s/it]

{'loss': 0.7917, 'learning_rate': 3.9612307692307695e-05, 'epoch': 0.62}


 21%|██        | 25350/121875 [7:47:36<29:28:05,  1.10s/it]

{'loss': 0.812, 'learning_rate': 3.960000000000001e-05, 'epoch': 0.62}


 21%|██        | 25380/121875 [7:48:09<29:35:29,  1.10s/it]

{'loss': 0.815, 'learning_rate': 3.9587692307692306e-05, 'epoch': 0.62}


 21%|██        | 25410/121875 [7:48:42<29:31:39,  1.10s/it]

{'loss': 0.8356, 'learning_rate': 3.957538461538462e-05, 'epoch': 0.63}


 21%|██        | 25440/121875 [7:49:15<29:41:43,  1.11s/it]

{'loss': 0.8025, 'learning_rate': 3.9563076923076924e-05, 'epoch': 0.63}


 21%|██        | 25470/121875 [7:49:48<29:35:06,  1.10s/it]

{'loss': 0.8049, 'learning_rate': 3.955076923076923e-05, 'epoch': 0.63}


 21%|██        | 25500/121875 [7:50:21<29:32:52,  1.10s/it]

{'loss': 0.7842, 'learning_rate': 3.953846153846154e-05, 'epoch': 0.63}


 21%|██        | 25530/121875 [7:50:57<29:33:36,  1.10s/it]

{'loss': 0.7795, 'learning_rate': 3.952615384615385e-05, 'epoch': 0.63}


 21%|██        | 25560/121875 [7:51:30<29:30:04,  1.10s/it]

{'loss': 0.7989, 'learning_rate': 3.9513846153846154e-05, 'epoch': 0.63}


 21%|██        | 25590/121875 [7:52:03<29:15:24,  1.09s/it]

{'loss': 0.7472, 'learning_rate': 3.950153846153846e-05, 'epoch': 0.63}


 21%|██        | 25620/121875 [7:52:36<29:29:39,  1.10s/it]

{'loss': 0.8247, 'learning_rate': 3.948923076923077e-05, 'epoch': 0.63}


 21%|██        | 25650/121875 [7:53:09<29:26:39,  1.10s/it]

{'loss': 0.788, 'learning_rate': 3.947692307692308e-05, 'epoch': 0.63}


 21%|██        | 25680/121875 [7:53:42<29:26:32,  1.10s/it]

{'loss': 0.7482, 'learning_rate': 3.946461538461539e-05, 'epoch': 0.63}


 21%|██        | 25710/121875 [7:54:15<29:26:12,  1.10s/it]

{'loss': 0.7742, 'learning_rate': 3.9452307692307696e-05, 'epoch': 0.63}


 21%|██        | 25740/121875 [7:54:48<29:25:42,  1.10s/it]

{'loss': 0.816, 'learning_rate': 3.944e-05, 'epoch': 0.63}


 21%|██        | 25770/121875 [7:55:21<29:36:57,  1.11s/it]

{'loss': 0.8331, 'learning_rate': 3.942769230769231e-05, 'epoch': 0.63}


 21%|██        | 25800/121875 [7:55:55<29:29:18,  1.10s/it]

{'loss': 0.8162, 'learning_rate': 3.941538461538461e-05, 'epoch': 0.64}


 21%|██        | 25830/121875 [7:56:28<29:30:59,  1.11s/it]

{'loss': 0.7729, 'learning_rate': 3.9403076923076926e-05, 'epoch': 0.64}


 21%|██        | 25860/121875 [7:57:01<29:25:27,  1.10s/it]

{'loss': 0.8892, 'learning_rate': 3.939076923076923e-05, 'epoch': 0.64}


 21%|██        | 25890/121875 [7:57:34<29:24:12,  1.10s/it]

{'loss': 0.809, 'learning_rate': 3.9378461538461544e-05, 'epoch': 0.64}


 21%|██▏       | 25920/121875 [7:58:07<29:18:18,  1.10s/it]

{'loss': 0.7483, 'learning_rate': 3.936615384615385e-05, 'epoch': 0.64}


 21%|██▏       | 25950/121875 [7:58:40<29:23:56,  1.10s/it]

{'loss': 0.7139, 'learning_rate': 3.9353846153846155e-05, 'epoch': 0.64}


 21%|██▏       | 25980/121875 [7:59:13<29:26:15,  1.11s/it]

{'loss': 0.7781, 'learning_rate': 3.934153846153846e-05, 'epoch': 0.64}


 21%|██▏       | 26010/121875 [7:59:48<30:01:43,  1.13s/it]

{'loss': 0.7974, 'learning_rate': 3.9329230769230774e-05, 'epoch': 0.64}


 21%|██▏       | 26040/121875 [8:00:22<29:23:09,  1.10s/it]

{'loss': 0.7602, 'learning_rate': 3.931692307692308e-05, 'epoch': 0.64}


 21%|██▏       | 26070/121875 [8:00:55<29:20:34,  1.10s/it]

{'loss': 0.7913, 'learning_rate': 3.9304615384615385e-05, 'epoch': 0.64}


 21%|██▏       | 26100/121875 [8:01:28<29:20:29,  1.10s/it]

{'loss': 0.7669, 'learning_rate': 3.92923076923077e-05, 'epoch': 0.64}


 21%|██▏       | 26130/121875 [8:02:01<29:12:31,  1.10s/it]

{'loss': 0.8342, 'learning_rate': 3.9280000000000003e-05, 'epoch': 0.64}


 21%|██▏       | 26160/121875 [8:02:34<29:21:01,  1.10s/it]

{'loss': 0.706, 'learning_rate': 3.926769230769231e-05, 'epoch': 0.64}


 21%|██▏       | 26190/121875 [8:03:07<29:17:59,  1.10s/it]

{'loss': 0.7694, 'learning_rate': 3.9255384615384615e-05, 'epoch': 0.64}


 22%|██▏       | 26220/121875 [8:03:40<28:32:35,  1.07s/it]

{'loss': 0.8221, 'learning_rate': 3.924307692307693e-05, 'epoch': 0.65}


 22%|██▏       | 26250/121875 [8:04:12<28:34:36,  1.08s/it]

{'loss': 0.7275, 'learning_rate': 3.923076923076923e-05, 'epoch': 0.65}


 22%|██▏       | 26280/121875 [8:04:44<29:01:34,  1.09s/it]

{'loss': 0.7591, 'learning_rate': 3.9218461538461546e-05, 'epoch': 0.65}


 22%|██▏       | 26310/121875 [8:05:17<29:22:07,  1.11s/it]

{'loss': 0.7834, 'learning_rate': 3.920615384615385e-05, 'epoch': 0.65}


 22%|██▏       | 26340/121875 [8:05:50<29:16:34,  1.10s/it]

{'loss': 0.7606, 'learning_rate': 3.919384615384615e-05, 'epoch': 0.65}


 22%|██▏       | 26370/121875 [8:06:23<29:15:43,  1.10s/it]

{'loss': 0.7826, 'learning_rate': 3.918153846153846e-05, 'epoch': 0.65}


 22%|██▏       | 26400/121875 [8:06:56<29:11:41,  1.10s/it]

{'loss': 0.7147, 'learning_rate': 3.916923076923077e-05, 'epoch': 0.65}


 22%|██▏       | 26430/121875 [8:07:29<29:13:29,  1.10s/it]

{'loss': 0.7654, 'learning_rate': 3.915692307692308e-05, 'epoch': 0.65}


 22%|██▏       | 26460/121875 [8:08:02<29:11:29,  1.10s/it]

{'loss': 0.7993, 'learning_rate': 3.914461538461539e-05, 'epoch': 0.65}


 22%|██▏       | 26490/121875 [8:08:35<29:11:24,  1.10s/it]

{'loss': 0.765, 'learning_rate': 3.91323076923077e-05, 'epoch': 0.65}


 22%|██▏       | 26520/121875 [8:09:10<29:09:16,  1.10s/it]

{'loss': 0.7702, 'learning_rate': 3.912e-05, 'epoch': 0.65}


 22%|██▏       | 26550/121875 [8:09:43<29:11:22,  1.10s/it]

{'loss': 0.7738, 'learning_rate': 3.910769230769231e-05, 'epoch': 0.65}


 22%|██▏       | 26580/121875 [8:10:16<29:05:23,  1.10s/it]

{'loss': 0.7614, 'learning_rate': 3.9095384615384616e-05, 'epoch': 0.65}


 22%|██▏       | 26610/121875 [8:10:49<29:04:07,  1.10s/it]

{'loss': 0.727, 'learning_rate': 3.908307692307692e-05, 'epoch': 0.66}


 22%|██▏       | 26640/121875 [8:11:23<29:15:13,  1.11s/it]

{'loss': 0.8095, 'learning_rate': 3.9070769230769235e-05, 'epoch': 0.66}


 22%|██▏       | 26670/121875 [8:11:56<29:12:47,  1.10s/it]

{'loss': 0.7537, 'learning_rate': 3.905846153846154e-05, 'epoch': 0.66}


 22%|██▏       | 26700/121875 [8:12:29<29:03:03,  1.10s/it]

{'loss': 0.7442, 'learning_rate': 3.9046153846153846e-05, 'epoch': 0.66}


 22%|██▏       | 26730/121875 [8:13:02<29:10:53,  1.10s/it]

{'loss': 0.7931, 'learning_rate': 3.903384615384615e-05, 'epoch': 0.66}


 22%|██▏       | 26760/121875 [8:13:35<28:29:46,  1.08s/it]

{'loss': 0.7956, 'learning_rate': 3.9021538461538464e-05, 'epoch': 0.66}


 22%|██▏       | 26790/121875 [8:14:07<28:37:45,  1.08s/it]

{'loss': 0.8377, 'learning_rate': 3.900923076923077e-05, 'epoch': 0.66}


 22%|██▏       | 26820/121875 [8:14:40<28:28:30,  1.08s/it]

{'loss': 0.8224, 'learning_rate': 3.899692307692308e-05, 'epoch': 0.66}


 22%|██▏       | 26850/121875 [8:15:12<29:08:48,  1.10s/it]

{'loss': 0.8454, 'learning_rate': 3.898461538461539e-05, 'epoch': 0.66}


 22%|██▏       | 26880/121875 [8:15:46<28:59:57,  1.10s/it]

{'loss': 0.7883, 'learning_rate': 3.8972307692307694e-05, 'epoch': 0.66}


 22%|██▏       | 26910/121875 [8:16:19<29:10:10,  1.11s/it]

{'loss': 0.8003, 'learning_rate': 3.896e-05, 'epoch': 0.66}


 22%|██▏       | 26940/121875 [8:16:52<29:02:05,  1.10s/it]

{'loss': 0.78, 'learning_rate': 3.8947692307692305e-05, 'epoch': 0.66}


 22%|██▏       | 26970/121875 [8:17:25<29:04:07,  1.10s/it]

{'loss': 0.7436, 'learning_rate': 3.893538461538462e-05, 'epoch': 0.66}


 22%|██▏       | 27000/121875 [8:17:58<29:03:46,  1.10s/it]

{'loss': 0.7675, 'learning_rate': 3.8923076923076924e-05, 'epoch': 0.66}


 22%|██▏       | 27030/121875 [8:18:33<28:54:25,  1.10s/it]

{'loss': 0.7771, 'learning_rate': 3.8910769230769236e-05, 'epoch': 0.67}


 22%|██▏       | 27060/121875 [8:19:06<29:03:15,  1.10s/it]

{'loss': 0.8074, 'learning_rate': 3.889846153846154e-05, 'epoch': 0.67}


 22%|██▏       | 27090/121875 [8:19:39<29:03:13,  1.10s/it]

{'loss': 0.7653, 'learning_rate': 3.888615384615385e-05, 'epoch': 0.67}


 22%|██▏       | 27120/121875 [8:20:12<28:56:53,  1.10s/it]

{'loss': 0.7512, 'learning_rate': 3.887384615384615e-05, 'epoch': 0.67}


 22%|██▏       | 27150/121875 [8:20:45<29:03:47,  1.10s/it]

{'loss': 0.7962, 'learning_rate': 3.8861538461538466e-05, 'epoch': 0.67}


 22%|██▏       | 27180/121875 [8:21:18<29:00:37,  1.10s/it]

{'loss': 0.853, 'learning_rate': 3.884923076923077e-05, 'epoch': 0.67}


 22%|██▏       | 27210/121875 [8:21:51<28:58:17,  1.10s/it]

{'loss': 0.8274, 'learning_rate': 3.883692307692308e-05, 'epoch': 0.67}


 22%|██▏       | 27240/121875 [8:22:24<28:47:42,  1.10s/it]

{'loss': 0.8573, 'learning_rate': 3.882461538461539e-05, 'epoch': 0.67}


 22%|██▏       | 27270/121875 [8:22:57<29:03:37,  1.11s/it]

{'loss': 0.813, 'learning_rate': 3.8812307692307695e-05, 'epoch': 0.67}


 22%|██▏       | 27300/121875 [8:23:31<29:06:16,  1.11s/it]

{'loss': 0.7795, 'learning_rate': 3.88e-05, 'epoch': 0.67}


 22%|██▏       | 27330/121875 [8:24:04<28:43:46,  1.09s/it]

{'loss': 0.8241, 'learning_rate': 3.878769230769231e-05, 'epoch': 0.67}


 22%|██▏       | 27360/121875 [8:24:36<28:17:52,  1.08s/it]

{'loss': 0.7572, 'learning_rate': 3.877538461538462e-05, 'epoch': 0.67}


 22%|██▏       | 27390/121875 [8:25:09<28:58:29,  1.10s/it]

{'loss': 0.7821, 'learning_rate': 3.8763076923076925e-05, 'epoch': 0.67}


 22%|██▏       | 27420/121875 [8:25:42<29:02:50,  1.11s/it]

{'loss': 0.8679, 'learning_rate': 3.875076923076924e-05, 'epoch': 0.67}


 23%|██▎       | 27450/121875 [8:26:15<28:56:08,  1.10s/it]

{'loss': 0.7927, 'learning_rate': 3.873846153846154e-05, 'epoch': 0.68}


 23%|██▎       | 27480/121875 [8:26:48<28:52:32,  1.10s/it]

{'loss': 0.8434, 'learning_rate': 3.872615384615384e-05, 'epoch': 0.68}


 23%|██▎       | 27510/121875 [8:27:24<29:41:22,  1.13s/it]

{'loss': 0.8156, 'learning_rate': 3.8713846153846155e-05, 'epoch': 0.68}


 23%|██▎       | 27540/121875 [8:27:57<28:55:50,  1.10s/it]

{'loss': 0.8329, 'learning_rate': 3.870153846153846e-05, 'epoch': 0.68}


 23%|██▎       | 27570/121875 [8:28:30<28:46:46,  1.10s/it]

{'loss': 0.7669, 'learning_rate': 3.868923076923077e-05, 'epoch': 0.68}


 23%|██▎       | 27600/121875 [8:29:03<28:56:27,  1.11s/it]

{'loss': 0.7769, 'learning_rate': 3.867692307692308e-05, 'epoch': 0.68}


 23%|██▎       | 27630/121875 [8:29:36<28:46:38,  1.10s/it]

{'loss': 0.6786, 'learning_rate': 3.866461538461539e-05, 'epoch': 0.68}


 23%|██▎       | 27660/121875 [8:30:09<28:48:34,  1.10s/it]

{'loss': 0.8023, 'learning_rate': 3.865230769230769e-05, 'epoch': 0.68}


 23%|██▎       | 27690/121875 [8:30:42<29:00:33,  1.11s/it]

{'loss': 0.7832, 'learning_rate': 3.864e-05, 'epoch': 0.68}


 23%|██▎       | 27720/121875 [8:31:15<28:54:50,  1.11s/it]

{'loss': 0.8323, 'learning_rate': 3.862769230769231e-05, 'epoch': 0.68}


 23%|██▎       | 27750/121875 [8:31:48<28:47:31,  1.10s/it]

{'loss': 0.8044, 'learning_rate': 3.861538461538462e-05, 'epoch': 0.68}


 23%|██▎       | 27780/121875 [8:32:22<28:51:48,  1.10s/it]

{'loss': 0.7857, 'learning_rate': 3.860307692307693e-05, 'epoch': 0.68}


 23%|██▎       | 27810/121875 [8:32:55<28:54:58,  1.11s/it]

{'loss': 0.8173, 'learning_rate': 3.859076923076923e-05, 'epoch': 0.68}


 23%|██▎       | 27840/121875 [8:33:28<28:47:15,  1.10s/it]

{'loss': 0.8602, 'learning_rate': 3.857846153846154e-05, 'epoch': 0.69}


 23%|██▎       | 27870/121875 [8:34:01<28:50:45,  1.10s/it]

{'loss': 0.7983, 'learning_rate': 3.8566153846153844e-05, 'epoch': 0.69}


 23%|██▎       | 27900/121875 [8:34:34<28:45:18,  1.10s/it]

{'loss': 0.7644, 'learning_rate': 3.8553846153846156e-05, 'epoch': 0.69}


 23%|██▎       | 27930/121875 [8:35:07<28:39:34,  1.10s/it]

{'loss': 0.8199, 'learning_rate': 3.854153846153846e-05, 'epoch': 0.69}


 23%|██▎       | 27960/121875 [8:35:40<28:43:40,  1.10s/it]

{'loss': 0.7423, 'learning_rate': 3.8529230769230775e-05, 'epoch': 0.69}


 23%|██▎       | 27990/121875 [8:36:13<28:51:53,  1.11s/it]

{'loss': 0.7891, 'learning_rate': 3.851692307692308e-05, 'epoch': 0.69}


 23%|██▎       | 28020/121875 [8:36:48<28:37:17,  1.10s/it]

{'loss': 0.7653, 'learning_rate': 3.8504615384615386e-05, 'epoch': 0.69}


 23%|██▎       | 28050/121875 [8:37:21<28:46:19,  1.10s/it]

{'loss': 0.8187, 'learning_rate': 3.849230769230769e-05, 'epoch': 0.69}


 23%|██▎       | 28080/121875 [8:37:54<28:28:48,  1.09s/it]

{'loss': 0.7655, 'learning_rate': 3.848e-05, 'epoch': 0.69}


 23%|██▎       | 28110/121875 [8:38:27<28:39:40,  1.10s/it]

{'loss': 0.7631, 'learning_rate': 3.846769230769231e-05, 'epoch': 0.69}


 23%|██▎       | 28140/121875 [8:39:00<28:53:07,  1.11s/it]

{'loss': 0.7635, 'learning_rate': 3.8455384615384616e-05, 'epoch': 0.69}


 23%|██▎       | 28170/121875 [8:39:34<29:26:01,  1.13s/it]

{'loss': 0.7512, 'learning_rate': 3.844307692307693e-05, 'epoch': 0.69}


 23%|██▎       | 28200/121875 [8:40:08<29:10:19,  1.12s/it]

{'loss': 0.7556, 'learning_rate': 3.8430769230769234e-05, 'epoch': 0.69}


 23%|██▎       | 28230/121875 [8:40:42<29:11:00,  1.12s/it]

{'loss': 0.7964, 'learning_rate': 3.841846153846154e-05, 'epoch': 0.69}


 23%|██▎       | 28260/121875 [8:41:15<29:12:00,  1.12s/it]

{'loss': 0.742, 'learning_rate': 3.8406153846153845e-05, 'epoch': 0.7}


 23%|██▎       | 28290/121875 [8:41:49<29:10:02,  1.12s/it]

{'loss': 0.8384, 'learning_rate': 3.839384615384616e-05, 'epoch': 0.7}


 23%|██▎       | 28320/121875 [8:42:23<29:07:30,  1.12s/it]

{'loss': 0.8407, 'learning_rate': 3.8381538461538464e-05, 'epoch': 0.7}


 23%|██▎       | 28350/121875 [8:42:57<29:06:45,  1.12s/it]

{'loss': 0.8321, 'learning_rate': 3.836923076923077e-05, 'epoch': 0.7}


 23%|██▎       | 28380/121875 [8:43:30<28:49:15,  1.11s/it]

{'loss': 0.7593, 'learning_rate': 3.835692307692308e-05, 'epoch': 0.7}


 23%|██▎       | 28410/121875 [8:44:03<28:33:05,  1.10s/it]

{'loss': 0.8011, 'learning_rate': 3.834461538461539e-05, 'epoch': 0.7}


 23%|██▎       | 28440/121875 [8:44:36<28:24:15,  1.09s/it]

{'loss': 0.7712, 'learning_rate': 3.833230769230769e-05, 'epoch': 0.7}


 23%|██▎       | 28470/121875 [8:45:10<29:04:38,  1.12s/it]

{'loss': 0.7744, 'learning_rate': 3.832e-05, 'epoch': 0.7}


 23%|██▎       | 28500/121875 [8:45:43<29:07:27,  1.12s/it]

{'loss': 0.8126, 'learning_rate': 3.830769230769231e-05, 'epoch': 0.7}


 23%|██▎       | 28530/121875 [8:46:19<29:05:56,  1.12s/it]

{'loss': 0.7592, 'learning_rate': 3.829538461538462e-05, 'epoch': 0.7}


 23%|██▎       | 28560/121875 [8:46:53<29:17:14,  1.13s/it]

{'loss': 0.7765, 'learning_rate': 3.828307692307693e-05, 'epoch': 0.7}


 23%|██▎       | 28590/121875 [8:47:27<28:39:59,  1.11s/it]

{'loss': 0.7577, 'learning_rate': 3.8270769230769235e-05, 'epoch': 0.7}


 23%|██▎       | 28620/121875 [8:48:00<28:35:16,  1.10s/it]

{'loss': 0.7871, 'learning_rate': 3.825846153846154e-05, 'epoch': 0.7}


 24%|██▎       | 28650/121875 [8:48:33<28:40:00,  1.11s/it]

{'loss': 0.798, 'learning_rate': 3.824615384615385e-05, 'epoch': 0.71}


 24%|██▎       | 28680/121875 [8:49:06<28:34:03,  1.10s/it]

{'loss': 0.8298, 'learning_rate': 3.823384615384615e-05, 'epoch': 0.71}


 24%|██▎       | 28710/121875 [8:49:39<28:39:18,  1.11s/it]

{'loss': 0.7317, 'learning_rate': 3.8221538461538465e-05, 'epoch': 0.71}


 24%|██▎       | 28740/121875 [8:50:12<28:33:21,  1.10s/it]

{'loss': 0.8074, 'learning_rate': 3.820923076923077e-05, 'epoch': 0.71}


 24%|██▎       | 28770/121875 [8:50:45<28:32:58,  1.10s/it]

{'loss': 0.8211, 'learning_rate': 3.819692307692308e-05, 'epoch': 0.71}


 24%|██▎       | 28800/121875 [8:51:18<28:30:12,  1.10s/it]

{'loss': 0.8017, 'learning_rate': 3.818461538461538e-05, 'epoch': 0.71}


 24%|██▎       | 28830/121875 [8:51:51<28:26:56,  1.10s/it]

{'loss': 0.7964, 'learning_rate': 3.8172307692307695e-05, 'epoch': 0.71}


 24%|██▎       | 28860/121875 [8:52:24<28:37:28,  1.11s/it]

{'loss': 0.704, 'learning_rate': 3.816e-05, 'epoch': 0.71}


 24%|██▎       | 28890/121875 [8:52:57<28:28:27,  1.10s/it]

{'loss': 0.8083, 'learning_rate': 3.814769230769231e-05, 'epoch': 0.71}


 24%|██▎       | 28920/121875 [8:53:30<28:26:16,  1.10s/it]

{'loss': 0.7213, 'learning_rate': 3.813538461538462e-05, 'epoch': 0.71}


 24%|██▍       | 28950/121875 [8:54:03<28:36:21,  1.11s/it]

{'loss': 0.7598, 'learning_rate': 3.8123076923076925e-05, 'epoch': 0.71}


 24%|██▍       | 28980/121875 [8:54:37<28:32:54,  1.11s/it]

{'loss': 0.8235, 'learning_rate': 3.811076923076923e-05, 'epoch': 0.71}


 24%|██▍       | 29010/121875 [8:55:12<29:05:36,  1.13s/it]

{'loss': 0.7693, 'learning_rate': 3.8098461538461536e-05, 'epoch': 0.71}


 24%|██▍       | 29040/121875 [8:55:45<28:31:39,  1.11s/it]

{'loss': 0.7949, 'learning_rate': 3.808615384615385e-05, 'epoch': 0.71}


 24%|██▍       | 29070/121875 [8:56:18<28:17:03,  1.10s/it]

{'loss': 0.7442, 'learning_rate': 3.8073846153846154e-05, 'epoch': 0.72}


 24%|██▍       | 29100/121875 [8:56:51<28:31:52,  1.11s/it]

{'loss': 0.8225, 'learning_rate': 3.806153846153847e-05, 'epoch': 0.72}


 24%|██▍       | 29130/121875 [8:57:24<28:20:27,  1.10s/it]

{'loss': 0.7822, 'learning_rate': 3.804923076923077e-05, 'epoch': 0.72}


 24%|██▍       | 29160/121875 [8:57:57<28:21:44,  1.10s/it]

{'loss': 0.8062, 'learning_rate': 3.803692307692308e-05, 'epoch': 0.72}


 24%|██▍       | 29190/121875 [8:58:30<28:16:38,  1.10s/it]

{'loss': 0.8222, 'learning_rate': 3.8024615384615384e-05, 'epoch': 0.72}


 24%|██▍       | 29220/121875 [8:59:03<28:12:36,  1.10s/it]

{'loss': 0.8245, 'learning_rate': 3.801230769230769e-05, 'epoch': 0.72}


 24%|██▍       | 29250/121875 [8:59:36<28:16:01,  1.10s/it]

{'loss': 0.7591, 'learning_rate': 3.8e-05, 'epoch': 0.72}


 24%|██▍       | 29280/121875 [9:00:09<28:11:28,  1.10s/it]

{'loss': 0.7408, 'learning_rate': 3.798769230769231e-05, 'epoch': 0.72}


 24%|██▍       | 29310/121875 [9:00:42<28:08:09,  1.09s/it]

{'loss': 0.7657, 'learning_rate': 3.797538461538462e-05, 'epoch': 0.72}


 24%|██▍       | 29340/121875 [9:01:15<28:17:53,  1.10s/it]

{'loss': 0.7757, 'learning_rate': 3.7963076923076926e-05, 'epoch': 0.72}


 24%|██▍       | 29370/121875 [9:01:48<28:13:44,  1.10s/it]

{'loss': 0.7564, 'learning_rate': 3.795076923076923e-05, 'epoch': 0.72}


 24%|██▍       | 29400/121875 [9:02:21<28:17:18,  1.10s/it]

{'loss': 0.7542, 'learning_rate': 3.793846153846154e-05, 'epoch': 0.72}


 24%|██▍       | 29430/121875 [9:02:54<28:07:33,  1.10s/it]

{'loss': 0.7718, 'learning_rate': 3.792615384615385e-05, 'epoch': 0.72}


 24%|██▍       | 29460/121875 [9:03:27<28:11:47,  1.10s/it]

{'loss': 0.7727, 'learning_rate': 3.7913846153846156e-05, 'epoch': 0.73}


 24%|██▍       | 29490/121875 [9:03:59<27:40:41,  1.08s/it]

{'loss': 0.7366, 'learning_rate': 3.790153846153847e-05, 'epoch': 0.73}


 24%|██▍       | 29520/121875 [9:04:34<27:35:08,  1.08s/it]

{'loss': 0.7332, 'learning_rate': 3.7889230769230774e-05, 'epoch': 0.73}


 24%|██▍       | 29550/121875 [9:05:06<28:10:54,  1.10s/it]

{'loss': 0.6752, 'learning_rate': 3.787692307692308e-05, 'epoch': 0.73}


 24%|██▍       | 29580/121875 [9:05:39<28:06:48,  1.10s/it]

{'loss': 0.8361, 'learning_rate': 3.7864615384615385e-05, 'epoch': 0.73}


 24%|██▍       | 29610/121875 [9:06:12<28:06:37,  1.10s/it]

{'loss': 0.7773, 'learning_rate': 3.785230769230769e-05, 'epoch': 0.73}


 24%|██▍       | 29640/121875 [9:06:45<28:14:41,  1.10s/it]

{'loss': 0.776, 'learning_rate': 3.7840000000000004e-05, 'epoch': 0.73}


 24%|██▍       | 29670/121875 [9:07:18<28:09:52,  1.10s/it]

{'loss': 0.744, 'learning_rate': 3.782769230769231e-05, 'epoch': 0.73}


 24%|██▍       | 29700/121875 [9:07:51<28:05:36,  1.10s/it]

{'loss': 0.8104, 'learning_rate': 3.781538461538462e-05, 'epoch': 0.73}


 24%|██▍       | 29730/121875 [9:08:24<28:11:23,  1.10s/it]

{'loss': 0.7887, 'learning_rate': 3.780307692307693e-05, 'epoch': 0.73}


 24%|██▍       | 29760/121875 [9:08:57<28:10:05,  1.10s/it]

{'loss': 0.7835, 'learning_rate': 3.779076923076923e-05, 'epoch': 0.73}


 24%|██▍       | 29790/121875 [9:09:30<28:12:33,  1.10s/it]

{'loss': 0.7804, 'learning_rate': 3.777846153846154e-05, 'epoch': 0.73}


 24%|██▍       | 29820/121875 [9:10:03<27:59:26,  1.09s/it]

{'loss': 0.7939, 'learning_rate': 3.7766153846153845e-05, 'epoch': 0.73}


 24%|██▍       | 29850/121875 [9:10:36<27:57:53,  1.09s/it]

{'loss': 0.7905, 'learning_rate': 3.775384615384616e-05, 'epoch': 0.73}


 25%|██▍       | 29880/121875 [9:11:09<28:03:14,  1.10s/it]

{'loss': 0.8035, 'learning_rate': 3.774153846153846e-05, 'epoch': 0.74}


 25%|██▍       | 29910/121875 [9:11:42<27:59:29,  1.10s/it]

{'loss': 0.7826, 'learning_rate': 3.7729230769230775e-05, 'epoch': 0.74}


 25%|██▍       | 29940/121875 [9:12:15<28:04:19,  1.10s/it]

{'loss': 0.7826, 'learning_rate': 3.7716923076923074e-05, 'epoch': 0.74}


 25%|██▍       | 29970/121875 [9:12:48<27:58:03,  1.10s/it]

{'loss': 0.8196, 'learning_rate': 3.770461538461539e-05, 'epoch': 0.74}


 25%|██▍       | 30000/121875 [9:13:21<28:01:23,  1.10s/it]

{'loss': 0.7704, 'learning_rate': 3.769230769230769e-05, 'epoch': 0.74}


 25%|██▍       | 30030/121875 [9:13:56<28:10:42,  1.10s/it]

{'loss': 0.7862, 'learning_rate': 3.7680000000000005e-05, 'epoch': 0.74}


 25%|██▍       | 30060/121875 [9:14:29<28:49:43,  1.13s/it]

{'loss': 0.8246, 'learning_rate': 3.766769230769231e-05, 'epoch': 0.74}


 25%|██▍       | 30090/121875 [9:15:02<28:08:24,  1.10s/it]

{'loss': 0.7871, 'learning_rate': 3.7655384615384617e-05, 'epoch': 0.74}


 25%|██▍       | 30120/121875 [9:15:36<28:11:29,  1.11s/it]

{'loss': 0.8057, 'learning_rate': 3.764307692307692e-05, 'epoch': 0.74}


 25%|██▍       | 30150/121875 [9:16:09<28:02:04,  1.10s/it]

{'loss': 0.8289, 'learning_rate': 3.763076923076923e-05, 'epoch': 0.74}


 25%|██▍       | 30180/121875 [9:16:42<28:16:07,  1.11s/it]

{'loss': 0.8061, 'learning_rate': 3.761846153846154e-05, 'epoch': 0.74}


 25%|██▍       | 30210/121875 [9:17:15<28:05:50,  1.10s/it]

{'loss': 0.7985, 'learning_rate': 3.7606153846153846e-05, 'epoch': 0.74}


 25%|██▍       | 30240/121875 [9:17:48<28:03:27,  1.10s/it]

{'loss': 0.7803, 'learning_rate': 3.759384615384616e-05, 'epoch': 0.74}


 25%|██▍       | 30270/121875 [9:18:21<28:01:37,  1.10s/it]

{'loss': 0.8017, 'learning_rate': 3.7581538461538465e-05, 'epoch': 0.75}


 25%|██▍       | 30300/121875 [9:18:54<28:09:58,  1.11s/it]

{'loss': 0.779, 'learning_rate': 3.756923076923077e-05, 'epoch': 0.75}


 25%|██▍       | 30330/121875 [9:19:27<28:06:59,  1.11s/it]

{'loss': 0.7622, 'learning_rate': 3.7556923076923076e-05, 'epoch': 0.75}


 25%|██▍       | 30360/121875 [9:20:00<27:55:21,  1.10s/it]

{'loss': 0.7911, 'learning_rate': 3.754461538461539e-05, 'epoch': 0.75}


 25%|██▍       | 30390/121875 [9:20:33<28:06:36,  1.11s/it]

{'loss': 0.7556, 'learning_rate': 3.7532307692307694e-05, 'epoch': 0.75}


 25%|██▍       | 30420/121875 [9:21:06<27:54:43,  1.10s/it]

{'loss': 0.7457, 'learning_rate': 3.752e-05, 'epoch': 0.75}


 25%|██▍       | 30450/121875 [9:21:40<28:02:54,  1.10s/it]

{'loss': 0.7608, 'learning_rate': 3.750769230769231e-05, 'epoch': 0.75}


 25%|██▌       | 30480/121875 [9:22:13<27:59:47,  1.10s/it]

{'loss': 0.832, 'learning_rate': 3.749538461538462e-05, 'epoch': 0.75}


 25%|██▌       | 30510/121875 [9:22:48<29:17:36,  1.15s/it]

{'loss': 0.769, 'learning_rate': 3.7483076923076924e-05, 'epoch': 0.75}


 25%|██▌       | 30540/121875 [9:23:22<28:37:18,  1.13s/it]

{'loss': 0.8329, 'learning_rate': 3.747076923076923e-05, 'epoch': 0.75}


 25%|██▌       | 30570/121875 [9:23:55<27:48:37,  1.10s/it]

{'loss': 0.7983, 'learning_rate': 3.745846153846154e-05, 'epoch': 0.75}


 25%|██▌       | 30600/121875 [9:24:28<27:53:54,  1.10s/it]

{'loss': 0.777, 'learning_rate': 3.744615384615385e-05, 'epoch': 0.75}


 25%|██▌       | 30630/121875 [9:25:01<28:26:48,  1.12s/it]

{'loss': 0.7497, 'learning_rate': 3.743384615384616e-05, 'epoch': 0.75}


 25%|██▌       | 30660/121875 [9:25:35<28:29:05,  1.12s/it]

{'loss': 0.7723, 'learning_rate': 3.7421538461538466e-05, 'epoch': 0.75}


 25%|██▌       | 30690/121875 [9:26:09<28:30:25,  1.13s/it]

{'loss': 0.7868, 'learning_rate': 3.740923076923077e-05, 'epoch': 0.76}


 25%|██▌       | 30720/121875 [9:26:42<28:23:09,  1.12s/it]

{'loss': 0.724, 'learning_rate': 3.739692307692308e-05, 'epoch': 0.76}


 25%|██▌       | 30750/121875 [9:27:16<28:32:43,  1.13s/it]

{'loss': 0.7755, 'learning_rate': 3.738461538461538e-05, 'epoch': 0.76}


 25%|██▌       | 30780/121875 [9:27:50<28:35:59,  1.13s/it]

{'loss': 0.7986, 'learning_rate': 3.7372307692307696e-05, 'epoch': 0.76}


 25%|██▌       | 30810/121875 [9:28:24<28:31:35,  1.13s/it]

{'loss': 0.7999, 'learning_rate': 3.736e-05, 'epoch': 0.76}


 25%|██▌       | 30840/121875 [9:28:58<28:39:00,  1.13s/it]

{'loss': 0.7371, 'learning_rate': 3.7347692307692314e-05, 'epoch': 0.76}


 25%|██▌       | 30870/121875 [9:29:32<28:39:10,  1.13s/it]

{'loss': 0.7872, 'learning_rate': 3.733538461538462e-05, 'epoch': 0.76}


 25%|██▌       | 30900/121875 [9:30:06<28:34:55,  1.13s/it]

{'loss': 0.7639, 'learning_rate': 3.7323076923076925e-05, 'epoch': 0.76}


 25%|██▌       | 30930/121875 [9:30:39<28:27:31,  1.13s/it]

{'loss': 0.8017, 'learning_rate': 3.731076923076923e-05, 'epoch': 0.76}


 25%|██▌       | 30960/121875 [9:31:13<28:32:01,  1.13s/it]

{'loss': 0.7381, 'learning_rate': 3.729846153846154e-05, 'epoch': 0.76}


 25%|██▌       | 30990/121875 [9:31:47<28:24:39,  1.13s/it]

{'loss': 0.7153, 'learning_rate': 3.728615384615385e-05, 'epoch': 0.76}


 25%|██▌       | 31020/121875 [9:32:23<28:28:33,  1.13s/it]

{'loss': 0.7127, 'learning_rate': 3.7273846153846155e-05, 'epoch': 0.76}


 25%|██▌       | 31050/121875 [9:32:57<27:49:36,  1.10s/it]

{'loss': 0.7556, 'learning_rate': 3.726153846153847e-05, 'epoch': 0.76}


 26%|██▌       | 31080/121875 [9:33:29<27:09:17,  1.08s/it]

{'loss': 0.7926, 'learning_rate': 3.7249230769230767e-05, 'epoch': 0.77}


 26%|██▌       | 31110/121875 [9:34:01<27:06:01,  1.07s/it]

{'loss': 0.7803, 'learning_rate': 3.723692307692308e-05, 'epoch': 0.77}


 26%|██▌       | 31140/121875 [9:34:34<27:12:12,  1.08s/it]

{'loss': 0.765, 'learning_rate': 3.7224615384615385e-05, 'epoch': 0.77}


 26%|██▌       | 31170/121875 [9:35:06<27:13:47,  1.08s/it]

{'loss': 0.764, 'learning_rate': 3.72123076923077e-05, 'epoch': 0.77}


 26%|██▌       | 31200/121875 [9:35:39<27:18:19,  1.08s/it]

{'loss': 0.7895, 'learning_rate': 3.72e-05, 'epoch': 0.77}


 26%|██▌       | 31230/121875 [9:36:11<27:11:23,  1.08s/it]

{'loss': 0.8096, 'learning_rate': 3.718769230769231e-05, 'epoch': 0.77}


 26%|██▌       | 31260/121875 [9:36:44<27:13:12,  1.08s/it]

{'loss': 0.8068, 'learning_rate': 3.7175384615384614e-05, 'epoch': 0.77}


 26%|██▌       | 31290/121875 [9:37:16<27:08:25,  1.08s/it]

{'loss': 0.818, 'learning_rate': 3.716307692307692e-05, 'epoch': 0.77}


 26%|██▌       | 31320/121875 [9:37:48<27:11:11,  1.08s/it]

{'loss': 0.7566, 'learning_rate': 3.715076923076923e-05, 'epoch': 0.77}


 26%|██▌       | 31350/121875 [9:38:21<27:12:48,  1.08s/it]

{'loss': 0.7744, 'learning_rate': 3.713846153846154e-05, 'epoch': 0.77}


 26%|██▌       | 31380/121875 [9:38:53<27:08:25,  1.08s/it]

{'loss': 0.7656, 'learning_rate': 3.712615384615385e-05, 'epoch': 0.77}


 26%|██▌       | 31410/121875 [9:39:26<27:12:51,  1.08s/it]

{'loss': 0.8148, 'learning_rate': 3.7113846153846157e-05, 'epoch': 0.77}


 26%|██▌       | 31440/121875 [9:39:58<27:07:26,  1.08s/it]

{'loss': 0.8048, 'learning_rate': 3.710153846153846e-05, 'epoch': 0.77}


 26%|██▌       | 31470/121875 [9:40:31<27:11:37,  1.08s/it]

{'loss': 0.7739, 'learning_rate': 3.708923076923077e-05, 'epoch': 0.77}


 26%|██▌       | 31500/121875 [9:41:03<27:09:50,  1.08s/it]

{'loss': 0.827, 'learning_rate': 3.707692307692308e-05, 'epoch': 0.78}


 26%|██▌       | 31530/121875 [9:41:38<27:12:11,  1.08s/it]

{'loss': 0.7778, 'learning_rate': 3.7064615384615386e-05, 'epoch': 0.78}


 26%|██▌       | 31560/121875 [9:42:11<27:27:25,  1.09s/it]

{'loss': 0.764, 'learning_rate': 3.705230769230769e-05, 'epoch': 0.78}


 26%|██▌       | 31590/121875 [9:42:43<27:02:33,  1.08s/it]

{'loss': 0.8234, 'learning_rate': 3.7040000000000005e-05, 'epoch': 0.78}


 26%|██▌       | 31620/121875 [9:43:16<27:06:10,  1.08s/it]

{'loss': 0.7151, 'learning_rate': 3.702769230769231e-05, 'epoch': 0.78}


 26%|██▌       | 31650/121875 [9:43:48<27:00:57,  1.08s/it]

{'loss': 0.8013, 'learning_rate': 3.7015384615384616e-05, 'epoch': 0.78}


 26%|██▌       | 31680/121875 [9:44:21<27:15:16,  1.09s/it]

{'loss': 0.7105, 'learning_rate': 3.700307692307692e-05, 'epoch': 0.78}


 26%|██▌       | 31710/121875 [9:44:53<27:10:30,  1.09s/it]

{'loss': 0.828, 'learning_rate': 3.6990769230769234e-05, 'epoch': 0.78}


 26%|██▌       | 31740/121875 [9:45:26<27:03:27,  1.08s/it]

{'loss': 0.7923, 'learning_rate': 3.697846153846154e-05, 'epoch': 0.78}


 26%|██▌       | 31770/121875 [9:45:58<26:59:03,  1.08s/it]

{'loss': 0.7244, 'learning_rate': 3.696615384615385e-05, 'epoch': 0.78}


 26%|██▌       | 31800/121875 [9:46:31<26:57:30,  1.08s/it]

{'loss': 0.8422, 'learning_rate': 3.695384615384616e-05, 'epoch': 0.78}


 26%|██▌       | 31830/121875 [9:47:03<27:04:30,  1.08s/it]

{'loss': 0.754, 'learning_rate': 3.6941538461538464e-05, 'epoch': 0.78}


 26%|██▌       | 31860/121875 [9:47:35<26:48:37,  1.07s/it]

{'loss': 0.7225, 'learning_rate': 3.692923076923077e-05, 'epoch': 0.78}


 26%|██▌       | 31890/121875 [9:48:08<26:57:21,  1.08s/it]

{'loss': 0.8024, 'learning_rate': 3.6916923076923075e-05, 'epoch': 0.78}


 26%|██▌       | 31920/121875 [9:48:40<27:00:46,  1.08s/it]

{'loss': 0.726, 'learning_rate': 3.690461538461539e-05, 'epoch': 0.79}


 26%|██▌       | 31950/121875 [9:49:13<26:50:45,  1.07s/it]

{'loss': 0.7739, 'learning_rate': 3.6892307692307694e-05, 'epoch': 0.79}


 26%|██▌       | 31980/121875 [9:49:45<27:06:46,  1.09s/it]

{'loss': 0.7473, 'learning_rate': 3.6880000000000006e-05, 'epoch': 0.79}


 26%|██▋       | 32010/121875 [9:50:20<27:43:10,  1.11s/it]

{'loss': 0.807, 'learning_rate': 3.686769230769231e-05, 'epoch': 0.79}


 26%|██▋       | 32040/121875 [9:50:52<26:55:03,  1.08s/it]

{'loss': 0.8404, 'learning_rate': 3.685538461538462e-05, 'epoch': 0.79}


 26%|██▋       | 32070/121875 [9:51:24<26:56:27,  1.08s/it]

{'loss': 0.7192, 'learning_rate': 3.684307692307692e-05, 'epoch': 0.79}


 26%|██▋       | 32100/121875 [9:51:57<27:14:57,  1.09s/it]

{'loss': 0.7671, 'learning_rate': 3.683076923076923e-05, 'epoch': 0.79}


 26%|██▋       | 32130/121875 [9:52:29<26:55:05,  1.08s/it]

{'loss': 0.8298, 'learning_rate': 3.681846153846154e-05, 'epoch': 0.79}


 26%|██▋       | 32160/121875 [9:53:02<27:06:58,  1.09s/it]

{'loss': 0.7814, 'learning_rate': 3.680615384615385e-05, 'epoch': 0.79}


 26%|██▋       | 32190/121875 [9:53:34<26:56:19,  1.08s/it]

{'loss': 0.8102, 'learning_rate': 3.679384615384616e-05, 'epoch': 0.79}


 26%|██▋       | 32220/121875 [9:54:07<27:01:18,  1.09s/it]

{'loss': 0.7382, 'learning_rate': 3.678153846153846e-05, 'epoch': 0.79}


 26%|██▋       | 32250/121875 [9:54:40<28:00:08,  1.12s/it]

{'loss': 0.751, 'learning_rate': 3.676923076923077e-05, 'epoch': 0.79}


 26%|██▋       | 32280/121875 [9:55:14<28:23:56,  1.14s/it]

{'loss': 0.758, 'learning_rate': 3.675692307692308e-05, 'epoch': 0.79}


 27%|██▋       | 32310/121875 [9:55:48<28:11:38,  1.13s/it]

{'loss': 0.7411, 'learning_rate': 3.674461538461539e-05, 'epoch': 0.8}


 27%|██▋       | 32340/121875 [9:56:22<28:05:02,  1.13s/it]

{'loss': 0.7956, 'learning_rate': 3.6732307692307695e-05, 'epoch': 0.8}


 27%|██▋       | 32370/121875 [9:56:56<27:53:15,  1.12s/it]

{'loss': 0.7573, 'learning_rate': 3.672000000000001e-05, 'epoch': 0.8}


 27%|██▋       | 32400/121875 [9:57:30<28:08:18,  1.13s/it]

{'loss': 0.7799, 'learning_rate': 3.6707692307692307e-05, 'epoch': 0.8}


 27%|██▋       | 32430/121875 [9:58:03<27:57:53,  1.13s/it]

{'loss': 0.8176, 'learning_rate': 3.669538461538461e-05, 'epoch': 0.8}


 27%|██▋       | 32460/121875 [9:58:37<27:57:10,  1.13s/it]

{'loss': 0.7498, 'learning_rate': 3.6683076923076925e-05, 'epoch': 0.8}


 27%|██▋       | 32490/121875 [9:59:11<27:58:12,  1.13s/it]

{'loss': 0.7526, 'learning_rate': 3.667076923076923e-05, 'epoch': 0.8}


 27%|██▋       | 32520/121875 [9:59:47<27:53:54,  1.12s/it]

{'loss': 0.8004, 'learning_rate': 3.665846153846154e-05, 'epoch': 0.8}


 27%|██▋       | 32550/121875 [10:00:21<28:02:28,  1.13s/it]

{'loss': 0.7701, 'learning_rate': 3.664615384615385e-05, 'epoch': 0.8}


 27%|██▋       | 32580/121875 [10:00:55<28:11:06,  1.14s/it]

{'loss': 0.7598, 'learning_rate': 3.6633846153846154e-05, 'epoch': 0.8}


 27%|██▋       | 32610/121875 [10:01:29<28:11:19,  1.14s/it]

{'loss': 0.7707, 'learning_rate': 3.662153846153846e-05, 'epoch': 0.8}


 27%|██▋       | 32640/121875 [10:02:03<28:09:35,  1.14s/it]

{'loss': 0.7593, 'learning_rate': 3.660923076923077e-05, 'epoch': 0.8}


 27%|██▋       | 32670/121875 [10:02:36<27:10:44,  1.10s/it]

{'loss': 0.8174, 'learning_rate': 3.659692307692308e-05, 'epoch': 0.8}


 27%|██▋       | 32700/121875 [10:03:09<27:06:23,  1.09s/it]

{'loss': 0.8581, 'learning_rate': 3.6584615384615384e-05, 'epoch': 0.8}


 27%|██▋       | 32730/121875 [10:03:42<26:44:19,  1.08s/it]

{'loss': 0.744, 'learning_rate': 3.6572307692307697e-05, 'epoch': 0.81}


 27%|██▋       | 32760/121875 [10:04:15<26:41:48,  1.08s/it]

{'loss': 0.7752, 'learning_rate': 3.656e-05, 'epoch': 0.81}


 27%|██▋       | 32790/121875 [10:04:47<27:03:16,  1.09s/it]

{'loss': 0.765, 'learning_rate': 3.654769230769231e-05, 'epoch': 0.81}


 27%|██▋       | 32820/121875 [10:05:20<27:11:57,  1.10s/it]

{'loss': 0.7923, 'learning_rate': 3.6535384615384614e-05, 'epoch': 0.81}


 27%|██▋       | 32850/121875 [10:05:53<27:25:32,  1.11s/it]

{'loss': 0.806, 'learning_rate': 3.6523076923076926e-05, 'epoch': 0.81}


 27%|██▋       | 32880/121875 [10:06:26<27:00:47,  1.09s/it]

{'loss': 0.7598, 'learning_rate': 3.651076923076923e-05, 'epoch': 0.81}


 27%|██▋       | 32910/121875 [10:06:59<27:00:28,  1.09s/it]

{'loss': 0.8405, 'learning_rate': 3.6498461538461545e-05, 'epoch': 0.81}


 27%|██▋       | 32940/121875 [10:07:31<27:07:52,  1.10s/it]

{'loss': 0.7477, 'learning_rate': 3.648615384615385e-05, 'epoch': 0.81}


 27%|██▋       | 32970/121875 [10:08:04<27:11:00,  1.10s/it]

{'loss': 0.7706, 'learning_rate': 3.6473846153846156e-05, 'epoch': 0.81}


 27%|██▋       | 33000/121875 [10:08:37<26:51:24,  1.09s/it]

{'loss': 0.7038, 'learning_rate': 3.646153846153846e-05, 'epoch': 0.81}


 27%|██▋       | 33030/121875 [10:09:12<27:11:41,  1.10s/it]

{'loss': 0.7469, 'learning_rate': 3.644923076923077e-05, 'epoch': 0.81}


 27%|██▋       | 33060/121875 [10:09:45<26:55:55,  1.09s/it]

{'loss': 0.7277, 'learning_rate': 3.643692307692308e-05, 'epoch': 0.81}


 27%|██▋       | 33090/121875 [10:10:18<27:05:15,  1.10s/it]

{'loss': 0.7663, 'learning_rate': 3.6424615384615386e-05, 'epoch': 0.81}


 27%|██▋       | 33120/121875 [10:10:51<27:13:13,  1.10s/it]

{'loss': 0.78, 'learning_rate': 3.64123076923077e-05, 'epoch': 0.82}


 27%|██▋       | 33150/121875 [10:11:24<27:04:09,  1.10s/it]

{'loss': 0.798, 'learning_rate': 3.6400000000000004e-05, 'epoch': 0.82}


 27%|██▋       | 33180/121875 [10:11:57<26:58:14,  1.09s/it]

{'loss': 0.7586, 'learning_rate': 3.638769230769231e-05, 'epoch': 0.82}


 27%|██▋       | 33210/121875 [10:12:29<26:48:31,  1.09s/it]

{'loss': 0.7803, 'learning_rate': 3.6375384615384615e-05, 'epoch': 0.82}


 27%|██▋       | 33240/121875 [10:13:02<27:15:27,  1.11s/it]

{'loss': 0.756, 'learning_rate': 3.636307692307693e-05, 'epoch': 0.82}


 27%|██▋       | 33270/121875 [10:13:35<26:54:22,  1.09s/it]

{'loss': 0.7534, 'learning_rate': 3.6350769230769234e-05, 'epoch': 0.82}


 27%|██▋       | 33300/121875 [10:14:08<27:01:09,  1.10s/it]

{'loss': 0.809, 'learning_rate': 3.633846153846154e-05, 'epoch': 0.82}


 27%|██▋       | 33330/121875 [10:14:41<26:49:38,  1.09s/it]

{'loss': 0.785, 'learning_rate': 3.632615384615385e-05, 'epoch': 0.82}


 27%|██▋       | 33360/121875 [10:15:14<27:06:54,  1.10s/it]

{'loss': 0.7353, 'learning_rate': 3.631384615384615e-05, 'epoch': 0.82}


 27%|██▋       | 33390/121875 [10:15:47<26:42:21,  1.09s/it]

{'loss': 0.8171, 'learning_rate': 3.630153846153846e-05, 'epoch': 0.82}


 27%|██▋       | 33420/121875 [10:16:20<26:47:15,  1.09s/it]

{'loss': 0.7972, 'learning_rate': 3.628923076923077e-05, 'epoch': 0.82}


 27%|██▋       | 33450/121875 [10:16:52<26:58:25,  1.10s/it]

{'loss': 0.8089, 'learning_rate': 3.627692307692308e-05, 'epoch': 0.82}


 27%|██▋       | 33480/121875 [10:17:25<27:06:31,  1.10s/it]

{'loss': 0.7944, 'learning_rate': 3.626461538461539e-05, 'epoch': 0.82}


 27%|██▋       | 33510/121875 [10:18:00<27:25:39,  1.12s/it]

{'loss': 0.742, 'learning_rate': 3.62523076923077e-05, 'epoch': 0.82}


 28%|██▊       | 33540/121875 [10:18:34<27:38:57,  1.13s/it]

{'loss': 0.803, 'learning_rate': 3.624e-05, 'epoch': 0.83}


 28%|██▊       | 33570/121875 [10:19:08<27:53:11,  1.14s/it]

{'loss': 0.7706, 'learning_rate': 3.6227692307692304e-05, 'epoch': 0.83}


 28%|██▊       | 33600/121875 [10:19:42<27:41:47,  1.13s/it]

{'loss': 0.7539, 'learning_rate': 3.621538461538462e-05, 'epoch': 0.83}


 28%|██▊       | 33630/121875 [10:20:16<27:56:52,  1.14s/it]

{'loss': 0.8409, 'learning_rate': 3.620307692307692e-05, 'epoch': 0.83}


 28%|██▊       | 33660/121875 [10:20:48<26:39:04,  1.09s/it]

{'loss': 0.8015, 'learning_rate': 3.6190769230769235e-05, 'epoch': 0.83}


 28%|██▊       | 33690/121875 [10:21:21<26:51:26,  1.10s/it]

{'loss': 0.7561, 'learning_rate': 3.617846153846154e-05, 'epoch': 0.83}


 28%|██▊       | 33720/121875 [10:21:54<26:42:41,  1.09s/it]

{'loss': 0.7519, 'learning_rate': 3.6166153846153847e-05, 'epoch': 0.83}


 28%|██▊       | 33750/121875 [10:22:27<26:56:40,  1.10s/it]

{'loss': 0.7644, 'learning_rate': 3.615384615384615e-05, 'epoch': 0.83}


 28%|██▊       | 33780/121875 [10:23:00<26:51:33,  1.10s/it]

{'loss': 0.7833, 'learning_rate': 3.6141538461538465e-05, 'epoch': 0.83}


 28%|██▊       | 33810/121875 [10:23:33<26:57:38,  1.10s/it]

{'loss': 0.7839, 'learning_rate': 3.612923076923077e-05, 'epoch': 0.83}


 28%|██▊       | 33840/121875 [10:24:06<26:57:32,  1.10s/it]

{'loss': 0.7883, 'learning_rate': 3.6116923076923076e-05, 'epoch': 0.83}


 28%|██▊       | 33870/121875 [10:24:39<26:52:14,  1.10s/it]

{'loss': 0.7619, 'learning_rate': 3.610461538461539e-05, 'epoch': 0.83}


 28%|██▊       | 33900/121875 [10:25:12<26:50:47,  1.10s/it]

{'loss': 0.7923, 'learning_rate': 3.6092307692307694e-05, 'epoch': 0.83}


 28%|██▊       | 33930/121875 [10:25:45<26:42:43,  1.09s/it]

{'loss': 0.7725, 'learning_rate': 3.608e-05, 'epoch': 0.84}


 28%|██▊       | 33960/121875 [10:26:18<26:42:08,  1.09s/it]

{'loss': 0.794, 'learning_rate': 3.6067692307692306e-05, 'epoch': 0.84}


 28%|██▊       | 33990/121875 [10:26:50<26:30:24,  1.09s/it]

{'loss': 0.7209, 'learning_rate': 3.605538461538462e-05, 'epoch': 0.84}


 28%|██▊       | 34020/121875 [10:27:25<26:50:59,  1.10s/it]

{'loss': 0.8002, 'learning_rate': 3.6043076923076924e-05, 'epoch': 0.84}


 28%|██▊       | 34050/121875 [10:27:59<26:54:21,  1.10s/it]

{'loss': 0.8366, 'learning_rate': 3.6030769230769237e-05, 'epoch': 0.84}


 28%|██▊       | 34080/121875 [10:28:32<26:54:59,  1.10s/it]

{'loss': 0.7952, 'learning_rate': 3.601846153846154e-05, 'epoch': 0.84}


 28%|██▊       | 34110/121875 [10:29:05<26:55:24,  1.10s/it]

{'loss': 0.7572, 'learning_rate': 3.600615384615385e-05, 'epoch': 0.84}


 28%|██▊       | 34140/121875 [10:29:38<26:47:14,  1.10s/it]

{'loss': 0.782, 'learning_rate': 3.5993846153846154e-05, 'epoch': 0.84}


 28%|██▊       | 34170/121875 [10:30:11<26:51:40,  1.10s/it]

{'loss': 0.7302, 'learning_rate': 3.598153846153846e-05, 'epoch': 0.84}


 28%|██▊       | 34200/121875 [10:30:44<26:53:51,  1.10s/it]

{'loss': 0.7525, 'learning_rate': 3.596923076923077e-05, 'epoch': 0.84}


 28%|██▊       | 34230/121875 [10:31:17<26:49:37,  1.10s/it]

{'loss': 0.7373, 'learning_rate': 3.595692307692308e-05, 'epoch': 0.84}


 28%|██▊       | 34260/121875 [10:31:50<26:47:05,  1.10s/it]

{'loss': 0.7527, 'learning_rate': 3.594461538461539e-05, 'epoch': 0.84}


 28%|██▊       | 34290/121875 [10:32:23<26:51:39,  1.10s/it]

{'loss': 0.7778, 'learning_rate': 3.5932307692307696e-05, 'epoch': 0.84}


 28%|██▊       | 34320/121875 [10:32:56<26:57:42,  1.11s/it]

{'loss': 0.7612, 'learning_rate': 3.592e-05, 'epoch': 0.84}


 28%|██▊       | 34350/121875 [10:33:29<26:40:53,  1.10s/it]

{'loss': 0.7253, 'learning_rate': 3.590769230769231e-05, 'epoch': 0.85}


 28%|██▊       | 34380/121875 [10:34:02<26:46:08,  1.10s/it]

{'loss': 0.8321, 'learning_rate': 3.589538461538462e-05, 'epoch': 0.85}


 28%|██▊       | 34410/121875 [10:34:35<26:45:12,  1.10s/it]

{'loss': 0.7193, 'learning_rate': 3.5883076923076926e-05, 'epoch': 0.85}


 28%|██▊       | 34440/121875 [10:35:08<26:44:33,  1.10s/it]

{'loss': 0.7552, 'learning_rate': 3.587076923076923e-05, 'epoch': 0.85}


 28%|██▊       | 34470/121875 [10:35:41<26:47:50,  1.10s/it]

{'loss': 0.8023, 'learning_rate': 3.5858461538461544e-05, 'epoch': 0.85}


 28%|██▊       | 34500/121875 [10:36:15<26:47:05,  1.10s/it]

{'loss': 0.7096, 'learning_rate': 3.584615384615384e-05, 'epoch': 0.85}


 28%|██▊       | 34530/121875 [10:36:50<26:48:11,  1.10s/it]

{'loss': 0.7521, 'learning_rate': 3.5833846153846155e-05, 'epoch': 0.85}


 28%|██▊       | 34560/121875 [10:37:23<26:41:34,  1.10s/it]

{'loss': 0.8321, 'learning_rate': 3.582153846153846e-05, 'epoch': 0.85}


 28%|██▊       | 34590/121875 [10:37:56<26:46:34,  1.10s/it]

{'loss': 0.7275, 'learning_rate': 3.5809230769230774e-05, 'epoch': 0.85}


 28%|██▊       | 34620/121875 [10:38:29<26:40:11,  1.10s/it]

{'loss': 0.7649, 'learning_rate': 3.579692307692308e-05, 'epoch': 0.85}


 28%|██▊       | 34650/121875 [10:39:02<26:41:13,  1.10s/it]

{'loss': 0.8429, 'learning_rate': 3.578461538461539e-05, 'epoch': 0.85}


 28%|██▊       | 34680/121875 [10:39:35<26:43:57,  1.10s/it]

{'loss': 0.7861, 'learning_rate': 3.577230769230769e-05, 'epoch': 0.85}


 28%|██▊       | 34710/121875 [10:40:08<26:35:59,  1.10s/it]

{'loss': 0.7407, 'learning_rate': 3.5759999999999996e-05, 'epoch': 0.85}


 29%|██▊       | 34740/121875 [10:40:41<26:40:04,  1.10s/it]

{'loss': 0.7108, 'learning_rate': 3.574769230769231e-05, 'epoch': 0.86}


 29%|██▊       | 34770/121875 [10:41:14<26:44:19,  1.11s/it]

{'loss': 0.7492, 'learning_rate': 3.5735384615384615e-05, 'epoch': 0.86}


 29%|██▊       | 34800/121875 [10:41:47<26:37:09,  1.10s/it]

{'loss': 0.784, 'learning_rate': 3.572307692307693e-05, 'epoch': 0.86}


 29%|██▊       | 34830/121875 [10:42:20<26:32:19,  1.10s/it]

{'loss': 0.6962, 'learning_rate': 3.571076923076923e-05, 'epoch': 0.86}


 29%|██▊       | 34860/121875 [10:42:53<26:39:22,  1.10s/it]

{'loss': 0.7815, 'learning_rate': 3.569846153846154e-05, 'epoch': 0.86}


 29%|██▊       | 34890/121875 [10:43:26<26:32:58,  1.10s/it]

{'loss': 0.8371, 'learning_rate': 3.5686153846153844e-05, 'epoch': 0.86}


 29%|██▊       | 34920/121875 [10:43:59<26:03:40,  1.08s/it]

{'loss': 0.6775, 'learning_rate': 3.567384615384616e-05, 'epoch': 0.86}


 29%|██▊       | 34950/121875 [10:44:31<25:58:10,  1.08s/it]

{'loss': 0.7886, 'learning_rate': 3.566153846153846e-05, 'epoch': 0.86}


 29%|██▊       | 34980/121875 [10:45:04<26:42:34,  1.11s/it]

{'loss': 0.7319, 'learning_rate': 3.5649230769230775e-05, 'epoch': 0.86}


 29%|██▊       | 35010/121875 [10:45:39<27:06:52,  1.12s/it]

{'loss': 0.7761, 'learning_rate': 3.563692307692308e-05, 'epoch': 0.86}


 29%|██▉       | 35040/121875 [10:46:12<26:26:20,  1.10s/it]

{'loss': 0.7788, 'learning_rate': 3.5624615384615387e-05, 'epoch': 0.86}


 29%|██▉       | 35070/121875 [10:46:45<26:25:55,  1.10s/it]

{'loss': 0.7565, 'learning_rate': 3.561230769230769e-05, 'epoch': 0.86}


 29%|██▉       | 35100/121875 [10:47:18<26:33:14,  1.10s/it]

{'loss': 0.7929, 'learning_rate': 3.56e-05, 'epoch': 0.86}


 29%|██▉       | 35130/121875 [10:47:51<26:32:30,  1.10s/it]

{'loss': 0.7587, 'learning_rate': 3.558769230769231e-05, 'epoch': 0.86}


 29%|██▉       | 35160/121875 [10:48:24<26:31:16,  1.10s/it]

{'loss': 0.8131, 'learning_rate': 3.5575384615384616e-05, 'epoch': 0.87}


 29%|██▉       | 35190/121875 [10:48:57<26:22:58,  1.10s/it]

{'loss': 0.7253, 'learning_rate': 3.556307692307693e-05, 'epoch': 0.87}


 29%|██▉       | 35220/121875 [10:49:30<26:28:54,  1.10s/it]

{'loss': 0.7961, 'learning_rate': 3.5550769230769234e-05, 'epoch': 0.87}


 29%|██▉       | 35250/121875 [10:50:03<26:27:04,  1.10s/it]

{'loss': 0.8044, 'learning_rate': 3.553846153846154e-05, 'epoch': 0.87}


 29%|██▉       | 35280/121875 [10:50:36<26:29:45,  1.10s/it]

{'loss': 0.7958, 'learning_rate': 3.5526153846153846e-05, 'epoch': 0.87}


 29%|██▉       | 35310/121875 [10:51:09<26:28:02,  1.10s/it]

{'loss': 0.7808, 'learning_rate': 3.551384615384615e-05, 'epoch': 0.87}


 29%|██▉       | 35340/121875 [10:51:42<26:28:50,  1.10s/it]

{'loss': 0.8063, 'learning_rate': 3.5501538461538464e-05, 'epoch': 0.87}


 29%|██▉       | 35370/121875 [10:52:15<26:25:59,  1.10s/it]

{'loss': 0.792, 'learning_rate': 3.548923076923077e-05, 'epoch': 0.87}


 29%|██▉       | 35400/121875 [10:52:48<26:29:27,  1.10s/it]

{'loss': 0.7644, 'learning_rate': 3.547692307692308e-05, 'epoch': 0.87}


 29%|██▉       | 35430/121875 [10:53:21<26:21:10,  1.10s/it]

{'loss': 0.735, 'learning_rate': 3.546461538461539e-05, 'epoch': 0.87}


 29%|██▉       | 35460/121875 [10:53:54<26:17:21,  1.10s/it]

{'loss': 0.6919, 'learning_rate': 3.5452307692307694e-05, 'epoch': 0.87}


 29%|██▉       | 35490/121875 [10:54:27<26:21:15,  1.10s/it]

{'loss': 0.7523, 'learning_rate': 3.544e-05, 'epoch': 0.87}


 29%|██▉       | 35520/121875 [10:55:02<26:18:51,  1.10s/it]

{'loss': 0.7831, 'learning_rate': 3.542769230769231e-05, 'epoch': 0.87}


 29%|██▉       | 35550/121875 [10:55:35<26:24:50,  1.10s/it]

{'loss': 0.7173, 'learning_rate': 3.541538461538462e-05, 'epoch': 0.88}


 29%|██▉       | 35580/121875 [10:56:08<26:26:54,  1.10s/it]

{'loss': 0.7655, 'learning_rate': 3.5403076923076923e-05, 'epoch': 0.88}


 29%|██▉       | 35610/121875 [10:56:41<26:22:36,  1.10s/it]

{'loss': 0.762, 'learning_rate': 3.5390769230769236e-05, 'epoch': 0.88}


 29%|██▉       | 35640/121875 [10:57:14<25:58:48,  1.08s/it]

{'loss': 0.7507, 'learning_rate': 3.5378461538461535e-05, 'epoch': 0.88}


 29%|██▉       | 35670/121875 [10:57:47<26:20:18,  1.10s/it]

{'loss': 0.7915, 'learning_rate': 3.536615384615385e-05, 'epoch': 0.88}


 29%|██▉       | 35700/121875 [10:58:20<26:17:57,  1.10s/it]

{'loss': 0.7491, 'learning_rate': 3.535384615384615e-05, 'epoch': 0.88}


 29%|██▉       | 35730/121875 [10:58:53<26:23:10,  1.10s/it]

{'loss': 0.7092, 'learning_rate': 3.5341538461538466e-05, 'epoch': 0.88}


 29%|██▉       | 35760/121875 [10:59:26<26:23:39,  1.10s/it]

{'loss': 0.766, 'learning_rate': 3.532923076923077e-05, 'epoch': 0.88}


 29%|██▉       | 35790/121875 [10:59:59<26:10:12,  1.09s/it]

{'loss': 0.7696, 'learning_rate': 3.5316923076923084e-05, 'epoch': 0.88}


 29%|██▉       | 35820/121875 [11:00:32<26:18:53,  1.10s/it]

{'loss': 0.7754, 'learning_rate': 3.530461538461538e-05, 'epoch': 0.88}


 29%|██▉       | 35850/121875 [11:01:05<26:10:54,  1.10s/it]

{'loss': 0.7793, 'learning_rate': 3.5292307692307695e-05, 'epoch': 0.88}


 29%|██▉       | 35880/121875 [11:01:38<26:14:35,  1.10s/it]

{'loss': 0.7278, 'learning_rate': 3.528e-05, 'epoch': 0.88}


 29%|██▉       | 35910/121875 [11:02:11<26:16:51,  1.10s/it]

{'loss': 0.7635, 'learning_rate': 3.526769230769231e-05, 'epoch': 0.88}


 29%|██▉       | 35940/121875 [11:02:44<26:17:44,  1.10s/it]

{'loss': 0.8182, 'learning_rate': 3.525538461538462e-05, 'epoch': 0.88}


 30%|██▉       | 35970/121875 [11:03:17<26:03:27,  1.09s/it]

{'loss': 0.719, 'learning_rate': 3.5243076923076925e-05, 'epoch': 0.89}


 30%|██▉       | 36000/121875 [11:03:50<26:13:19,  1.10s/it]

{'loss': 0.7834, 'learning_rate': 3.523076923076923e-05, 'epoch': 0.89}


 30%|██▉       | 36030/121875 [11:04:25<26:08:27,  1.10s/it]

{'loss': 0.7997, 'learning_rate': 3.5218461538461536e-05, 'epoch': 0.89}


 30%|██▉       | 36060/121875 [11:04:58<26:07:57,  1.10s/it]

{'loss': 0.8195, 'learning_rate': 3.520615384615385e-05, 'epoch': 0.89}


 30%|██▉       | 36090/121875 [11:05:31<26:11:32,  1.10s/it]

{'loss': 0.7888, 'learning_rate': 3.5193846153846155e-05, 'epoch': 0.89}


 30%|██▉       | 36120/121875 [11:06:04<26:18:52,  1.10s/it]

{'loss': 0.7639, 'learning_rate': 3.518153846153847e-05, 'epoch': 0.89}


 30%|██▉       | 36150/121875 [11:06:37<26:07:31,  1.10s/it]

{'loss': 0.7376, 'learning_rate': 3.516923076923077e-05, 'epoch': 0.89}


 30%|██▉       | 36180/121875 [11:07:10<26:04:48,  1.10s/it]

{'loss': 0.7903, 'learning_rate': 3.515692307692308e-05, 'epoch': 0.89}


 30%|██▉       | 36210/121875 [11:07:43<26:06:04,  1.10s/it]

{'loss': 0.7897, 'learning_rate': 3.5144615384615384e-05, 'epoch': 0.89}


 30%|██▉       | 36240/121875 [11:08:16<26:07:40,  1.10s/it]

{'loss': 0.7265, 'learning_rate': 3.513230769230769e-05, 'epoch': 0.89}


 30%|██▉       | 36270/121875 [11:08:49<26:02:59,  1.10s/it]

{'loss': 0.78, 'learning_rate': 3.512e-05, 'epoch': 0.89}


 30%|██▉       | 36300/121875 [11:09:22<26:13:51,  1.10s/it]

{'loss': 0.8102, 'learning_rate': 3.510769230769231e-05, 'epoch': 0.89}


 30%|██▉       | 36330/121875 [11:09:55<26:11:00,  1.10s/it]

{'loss': 0.7618, 'learning_rate': 3.509538461538462e-05, 'epoch': 0.89}


 30%|██▉       | 36360/121875 [11:10:28<26:05:37,  1.10s/it]

{'loss': 0.8064, 'learning_rate': 3.5083076923076927e-05, 'epoch': 0.9}


 30%|██▉       | 36390/121875 [11:11:01<26:08:36,  1.10s/it]

{'loss': 0.7677, 'learning_rate': 3.507076923076923e-05, 'epoch': 0.9}


 30%|██▉       | 36420/121875 [11:11:34<26:05:38,  1.10s/it]

{'loss': 0.7339, 'learning_rate': 3.505846153846154e-05, 'epoch': 0.9}


 30%|██▉       | 36450/121875 [11:12:07<26:01:05,  1.10s/it]

{'loss': 0.7591, 'learning_rate': 3.5046153846153844e-05, 'epoch': 0.9}


 30%|██▉       | 36480/121875 [11:12:40<26:06:10,  1.10s/it]

{'loss': 0.7225, 'learning_rate': 3.5033846153846156e-05, 'epoch': 0.9}


 30%|██▉       | 36510/121875 [11:13:15<26:45:07,  1.13s/it]

{'loss': 0.728, 'learning_rate': 3.502153846153846e-05, 'epoch': 0.9}


 30%|██▉       | 36540/121875 [11:13:48<26:04:08,  1.10s/it]

{'loss': 0.7618, 'learning_rate': 3.5009230769230774e-05, 'epoch': 0.9}


 30%|███       | 36570/121875 [11:14:21<26:07:28,  1.10s/it]

{'loss': 0.7265, 'learning_rate': 3.499692307692308e-05, 'epoch': 0.9}


 30%|███       | 36600/121875 [11:14:54<25:58:59,  1.10s/it]

{'loss': 0.8357, 'learning_rate': 3.4984615384615386e-05, 'epoch': 0.9}


 30%|███       | 36630/121875 [11:15:27<25:59:09,  1.10s/it]

{'loss': 0.7625, 'learning_rate': 3.497230769230769e-05, 'epoch': 0.9}


 30%|███       | 36660/121875 [11:16:00<25:56:59,  1.10s/it]

{'loss': 0.7245, 'learning_rate': 3.4960000000000004e-05, 'epoch': 0.9}


 30%|███       | 36690/121875 [11:16:33<26:03:40,  1.10s/it]

{'loss': 0.7009, 'learning_rate': 3.494769230769231e-05, 'epoch': 0.9}


 30%|███       | 36720/121875 [11:17:06<25:56:46,  1.10s/it]

{'loss': 0.7621, 'learning_rate': 3.493538461538462e-05, 'epoch': 0.9}


 30%|███       | 36750/121875 [11:17:39<25:53:01,  1.09s/it]

{'loss': 0.7354, 'learning_rate': 3.492307692307693e-05, 'epoch': 0.9}


 30%|███       | 36780/121875 [11:18:12<25:55:41,  1.10s/it]

{'loss': 0.7607, 'learning_rate': 3.491076923076923e-05, 'epoch': 0.91}


 30%|███       | 36810/121875 [11:18:45<25:56:33,  1.10s/it]

{'loss': 0.7202, 'learning_rate': 3.489846153846154e-05, 'epoch': 0.91}


 30%|███       | 36840/121875 [11:19:18<25:58:57,  1.10s/it]

{'loss': 0.7364, 'learning_rate': 3.4886153846153845e-05, 'epoch': 0.91}


 30%|███       | 36870/121875 [11:19:51<25:55:52,  1.10s/it]

{'loss': 0.7434, 'learning_rate': 3.487384615384616e-05, 'epoch': 0.91}


 30%|███       | 36900/121875 [11:20:24<25:52:55,  1.10s/it]

{'loss': 0.7101, 'learning_rate': 3.4861538461538463e-05, 'epoch': 0.91}


 30%|███       | 36930/121875 [11:20:57<25:50:17,  1.10s/it]

{'loss': 0.8175, 'learning_rate': 3.4849230769230776e-05, 'epoch': 0.91}


 30%|███       | 36960/121875 [11:21:30<25:56:04,  1.10s/it]

{'loss': 0.8134, 'learning_rate': 3.4836923076923075e-05, 'epoch': 0.91}


 30%|███       | 36990/121875 [11:22:03<25:55:48,  1.10s/it]

{'loss': 0.7292, 'learning_rate': 3.482461538461539e-05, 'epoch': 0.91}


 30%|███       | 37020/121875 [11:22:38<26:09:02,  1.11s/it]

{'loss': 0.8266, 'learning_rate': 3.481230769230769e-05, 'epoch': 0.91}


 30%|███       | 37050/121875 [11:23:12<25:58:45,  1.10s/it]

{'loss': 0.7652, 'learning_rate': 3.48e-05, 'epoch': 0.91}


 30%|███       | 37080/121875 [11:23:44<25:57:30,  1.10s/it]

{'loss': 0.6964, 'learning_rate': 3.478769230769231e-05, 'epoch': 0.91}


 30%|███       | 37110/121875 [11:24:18<25:55:33,  1.10s/it]

{'loss': 0.7624, 'learning_rate': 3.477538461538462e-05, 'epoch': 0.91}


 30%|███       | 37140/121875 [11:24:51<25:51:55,  1.10s/it]

{'loss': 0.7979, 'learning_rate': 3.476307692307692e-05, 'epoch': 0.91}


 30%|███       | 37170/121875 [11:25:24<25:44:35,  1.09s/it]

{'loss': 0.7649, 'learning_rate': 3.475076923076923e-05, 'epoch': 0.91}


 31%|███       | 37200/121875 [11:25:57<25:55:09,  1.10s/it]

{'loss': 0.8236, 'learning_rate': 3.473846153846154e-05, 'epoch': 0.92}


 31%|███       | 37230/121875 [11:26:30<25:55:16,  1.10s/it]

{'loss': 0.7983, 'learning_rate': 3.472615384615385e-05, 'epoch': 0.92}


 31%|███       | 37260/121875 [11:27:03<25:47:07,  1.10s/it]

{'loss': 0.7686, 'learning_rate': 3.471384615384616e-05, 'epoch': 0.92}


 31%|███       | 37290/121875 [11:27:36<25:46:04,  1.10s/it]

{'loss': 0.7687, 'learning_rate': 3.4701538461538465e-05, 'epoch': 0.92}


 31%|███       | 37320/121875 [11:28:09<25:42:13,  1.09s/it]

{'loss': 0.8002, 'learning_rate': 3.468923076923077e-05, 'epoch': 0.92}


 31%|███       | 37350/121875 [11:28:41<25:52:14,  1.10s/it]

{'loss': 0.7365, 'learning_rate': 3.4676923076923076e-05, 'epoch': 0.92}


 31%|███       | 37380/121875 [11:29:15<25:54:07,  1.10s/it]

{'loss': 0.7112, 'learning_rate': 3.466461538461538e-05, 'epoch': 0.92}


 31%|███       | 37410/121875 [11:29:47<25:51:36,  1.10s/it]

{'loss': 0.703, 'learning_rate': 3.4652307692307695e-05, 'epoch': 0.92}


 31%|███       | 37440/121875 [11:30:20<25:45:57,  1.10s/it]

{'loss': 0.7642, 'learning_rate': 3.464e-05, 'epoch': 0.92}


 31%|███       | 37470/121875 [11:30:53<25:49:21,  1.10s/it]

{'loss': 0.7859, 'learning_rate': 3.462769230769231e-05, 'epoch': 0.92}


 31%|███       | 37500/121875 [11:31:26<25:36:21,  1.09s/it]

{'loss': 0.7645, 'learning_rate': 3.461538461538462e-05, 'epoch': 0.92}


 31%|███       | 37530/121875 [11:32:02<25:47:04,  1.10s/it]

{'loss': 0.7336, 'learning_rate': 3.4603076923076924e-05, 'epoch': 0.92}


 31%|███       | 37560/121875 [11:32:35<25:54:18,  1.11s/it]

{'loss': 0.7245, 'learning_rate': 3.459076923076923e-05, 'epoch': 0.92}


 31%|███       | 37590/121875 [11:33:08<25:44:52,  1.10s/it]

{'loss': 0.7953, 'learning_rate': 3.4578461538461536e-05, 'epoch': 0.93}


 31%|███       | 37620/121875 [11:33:41<25:42:38,  1.10s/it]

{'loss': 0.7856, 'learning_rate': 3.456615384615385e-05, 'epoch': 0.93}


 31%|███       | 37650/121875 [11:34:14<25:41:59,  1.10s/it]

{'loss': 0.751, 'learning_rate': 3.4553846153846154e-05, 'epoch': 0.93}


 31%|███       | 37680/121875 [11:34:47<25:43:13,  1.10s/it]

{'loss': 0.7346, 'learning_rate': 3.4541538461538467e-05, 'epoch': 0.93}


 31%|███       | 37710/121875 [11:35:20<25:42:09,  1.10s/it]

{'loss': 0.78, 'learning_rate': 3.452923076923077e-05, 'epoch': 0.93}


 31%|███       | 37740/121875 [11:35:53<25:45:37,  1.10s/it]

{'loss': 0.7871, 'learning_rate': 3.451692307692308e-05, 'epoch': 0.93}


 31%|███       | 37770/121875 [11:36:26<25:48:35,  1.10s/it]

{'loss': 0.7432, 'learning_rate': 3.4504615384615384e-05, 'epoch': 0.93}


 31%|███       | 37800/121875 [11:36:59<25:40:07,  1.10s/it]

{'loss': 0.8112, 'learning_rate': 3.4492307692307696e-05, 'epoch': 0.93}


 31%|███       | 37830/121875 [11:37:32<25:43:46,  1.10s/it]

{'loss': 0.7669, 'learning_rate': 3.448e-05, 'epoch': 0.93}


 31%|███       | 37860/121875 [11:38:05<25:39:51,  1.10s/it]

{'loss': 0.7508, 'learning_rate': 3.4467692307692314e-05, 'epoch': 0.93}


 31%|███       | 37890/121875 [11:38:38<25:46:38,  1.10s/it]

{'loss': 0.8354, 'learning_rate': 3.445538461538462e-05, 'epoch': 0.93}


 31%|███       | 37920/121875 [11:39:11<25:37:38,  1.10s/it]

{'loss': 0.7782, 'learning_rate': 3.444307692307692e-05, 'epoch': 0.93}


 31%|███       | 37950/121875 [11:39:44<25:42:39,  1.10s/it]

{'loss': 0.7746, 'learning_rate': 3.443076923076923e-05, 'epoch': 0.93}


 31%|███       | 37980/121875 [11:40:17<25:35:46,  1.10s/it]

{'loss': 0.7132, 'learning_rate': 3.441846153846154e-05, 'epoch': 0.93}


 31%|███       | 38010/121875 [11:40:52<26:15:38,  1.13s/it]

{'loss': 0.7561, 'learning_rate': 3.440615384615385e-05, 'epoch': 0.94}


 31%|███       | 38040/121875 [11:41:25<25:41:40,  1.10s/it]

{'loss': 0.7848, 'learning_rate': 3.4393846153846156e-05, 'epoch': 0.94}


 31%|███       | 38070/121875 [11:41:58<25:30:07,  1.10s/it]

{'loss': 0.7686, 'learning_rate': 3.438153846153847e-05, 'epoch': 0.94}


 31%|███▏      | 38100/121875 [11:42:31<24:58:17,  1.07s/it]

{'loss': 0.7577, 'learning_rate': 3.436923076923077e-05, 'epoch': 0.94}


 31%|███▏      | 38130/121875 [11:43:04<25:40:25,  1.10s/it]

{'loss': 0.7347, 'learning_rate': 3.435692307692308e-05, 'epoch': 0.94}


 31%|███▏      | 38160/121875 [11:43:37<25:29:05,  1.10s/it]

{'loss': 0.8015, 'learning_rate': 3.4344615384615385e-05, 'epoch': 0.94}


 31%|███▏      | 38190/121875 [11:44:10<25:30:43,  1.10s/it]

{'loss': 0.7456, 'learning_rate': 3.433230769230769e-05, 'epoch': 0.94}


 31%|███▏      | 38220/121875 [11:44:43<25:30:42,  1.10s/it]

{'loss': 0.8078, 'learning_rate': 3.4320000000000003e-05, 'epoch': 0.94}


 31%|███▏      | 38250/121875 [11:45:16<25:30:31,  1.10s/it]

{'loss': 0.7372, 'learning_rate': 3.430769230769231e-05, 'epoch': 0.94}


 31%|███▏      | 38280/121875 [11:45:49<25:37:07,  1.10s/it]

{'loss': 0.7394, 'learning_rate': 3.4295384615384615e-05, 'epoch': 0.94}


 31%|███▏      | 38310/121875 [11:46:22<25:38:01,  1.10s/it]

{'loss': 0.792, 'learning_rate': 3.428307692307692e-05, 'epoch': 0.94}


 31%|███▏      | 38340/121875 [11:46:55<25:33:52,  1.10s/it]

{'loss': 0.7434, 'learning_rate': 3.427076923076923e-05, 'epoch': 0.94}


 31%|███▏      | 38370/121875 [11:47:28<25:29:19,  1.10s/it]

{'loss': 0.707, 'learning_rate': 3.425846153846154e-05, 'epoch': 0.94}


 32%|███▏      | 38400/121875 [11:48:01<25:35:45,  1.10s/it]

{'loss': 0.7817, 'learning_rate': 3.424615384615385e-05, 'epoch': 0.95}


 32%|███▏      | 38430/121875 [11:48:34<25:24:32,  1.10s/it]

{'loss': 0.8014, 'learning_rate': 3.423384615384616e-05, 'epoch': 0.95}


 32%|███▏      | 38460/121875 [11:49:07<25:30:09,  1.10s/it]

{'loss': 0.7742, 'learning_rate': 3.422153846153846e-05, 'epoch': 0.95}


 32%|███▏      | 38490/121875 [11:49:40<25:27:18,  1.10s/it]

{'loss': 0.7696, 'learning_rate': 3.420923076923077e-05, 'epoch': 0.95}


 32%|███▏      | 38520/121875 [11:50:15<25:32:19,  1.10s/it]

{'loss': 0.7977, 'learning_rate': 3.4196923076923074e-05, 'epoch': 0.95}


 32%|███▏      | 38550/121875 [11:50:48<25:30:17,  1.10s/it]

{'loss': 0.7973, 'learning_rate': 3.418461538461539e-05, 'epoch': 0.95}


 32%|███▏      | 38580/121875 [11:51:21<25:28:09,  1.10s/it]

{'loss': 0.7581, 'learning_rate': 3.417230769230769e-05, 'epoch': 0.95}


 32%|███▏      | 38610/121875 [11:51:54<25:13:56,  1.09s/it]

{'loss': 0.7546, 'learning_rate': 3.4160000000000005e-05, 'epoch': 0.95}


 32%|███▏      | 38640/121875 [11:52:27<25:25:31,  1.10s/it]

{'loss': 0.6981, 'learning_rate': 3.414769230769231e-05, 'epoch': 0.95}


 32%|███▏      | 38670/121875 [11:53:00<25:20:46,  1.10s/it]

{'loss': 0.8088, 'learning_rate': 3.4135384615384616e-05, 'epoch': 0.95}


 32%|███▏      | 38700/121875 [11:53:33<25:33:23,  1.11s/it]

{'loss': 0.7417, 'learning_rate': 3.412307692307692e-05, 'epoch': 0.95}


 32%|███▏      | 38730/121875 [11:54:06<25:28:57,  1.10s/it]

{'loss': 0.7719, 'learning_rate': 3.4110769230769235e-05, 'epoch': 0.95}


 32%|███▏      | 38760/121875 [11:54:39<25:17:15,  1.10s/it]

{'loss': 0.7327, 'learning_rate': 3.409846153846154e-05, 'epoch': 0.95}


 32%|███▏      | 38790/121875 [11:55:12<25:23:15,  1.10s/it]

{'loss': 0.7772, 'learning_rate': 3.4086153846153846e-05, 'epoch': 0.95}


 32%|███▏      | 38820/121875 [11:55:45<25:20:56,  1.10s/it]

{'loss': 0.8436, 'learning_rate': 3.407384615384616e-05, 'epoch': 0.96}


 32%|███▏      | 38850/121875 [11:56:18<25:24:01,  1.10s/it]

{'loss': 0.7813, 'learning_rate': 3.4061538461538464e-05, 'epoch': 0.96}


 32%|███▏      | 38880/121875 [11:56:51<25:18:43,  1.10s/it]

{'loss': 0.6942, 'learning_rate': 3.404923076923077e-05, 'epoch': 0.96}


 32%|███▏      | 38910/121875 [11:57:24<25:24:48,  1.10s/it]

{'loss': 0.7708, 'learning_rate': 3.4036923076923076e-05, 'epoch': 0.96}


 32%|███▏      | 38940/121875 [11:57:57<25:13:42,  1.10s/it]

{'loss': 0.7396, 'learning_rate': 3.402461538461539e-05, 'epoch': 0.96}


 32%|███▏      | 38970/121875 [11:58:30<25:20:10,  1.10s/it]

{'loss': 0.8108, 'learning_rate': 3.4012307692307694e-05, 'epoch': 0.96}


 32%|███▏      | 39000/121875 [11:59:03<25:19:09,  1.10s/it]

{'loss': 0.7103, 'learning_rate': 3.4000000000000007e-05, 'epoch': 0.96}


 32%|███▏      | 39030/121875 [11:59:38<25:17:55,  1.10s/it]

{'loss': 0.7852, 'learning_rate': 3.398769230769231e-05, 'epoch': 0.96}


 32%|███▏      | 39060/121875 [12:00:11<25:13:23,  1.10s/it]

{'loss': 0.7167, 'learning_rate': 3.397538461538461e-05, 'epoch': 0.96}


 32%|███▏      | 39090/121875 [12:00:44<25:21:47,  1.10s/it]

{'loss': 0.7643, 'learning_rate': 3.3963076923076924e-05, 'epoch': 0.96}


 32%|███▏      | 39120/121875 [12:01:17<25:23:25,  1.10s/it]

{'loss': 0.8438, 'learning_rate': 3.395076923076923e-05, 'epoch': 0.96}


 32%|███▏      | 39150/121875 [12:01:50<25:17:41,  1.10s/it]

{'loss': 0.7956, 'learning_rate': 3.393846153846154e-05, 'epoch': 0.96}


 32%|███▏      | 39180/121875 [12:02:23<25:24:40,  1.11s/it]

{'loss': 0.7738, 'learning_rate': 3.392615384615385e-05, 'epoch': 0.96}


 32%|███▏      | 39210/121875 [12:02:56<25:13:24,  1.10s/it]

{'loss': 0.796, 'learning_rate': 3.391384615384616e-05, 'epoch': 0.97}


 32%|███▏      | 39240/121875 [12:03:29<25:16:06,  1.10s/it]

{'loss': 0.8087, 'learning_rate': 3.390153846153846e-05, 'epoch': 0.97}


 32%|███▏      | 39270/121875 [12:04:02<25:06:50,  1.09s/it]

{'loss': 0.7824, 'learning_rate': 3.388923076923077e-05, 'epoch': 0.97}


 32%|███▏      | 39300/121875 [12:04:35<25:08:27,  1.10s/it]

{'loss': 0.7912, 'learning_rate': 3.387692307692308e-05, 'epoch': 0.97}


 32%|███▏      | 39330/121875 [12:05:08<25:07:49,  1.10s/it]

{'loss': 0.7084, 'learning_rate': 3.386461538461538e-05, 'epoch': 0.97}


 32%|███▏      | 39360/121875 [12:05:41<25:05:31,  1.09s/it]

{'loss': 0.8113, 'learning_rate': 3.3852307692307696e-05, 'epoch': 0.97}


 32%|███▏      | 39390/121875 [12:06:14<25:11:49,  1.10s/it]

{'loss': 0.7197, 'learning_rate': 3.384e-05, 'epoch': 0.97}


 32%|███▏      | 39420/121875 [12:06:47<25:07:46,  1.10s/it]

{'loss': 0.7677, 'learning_rate': 3.382769230769231e-05, 'epoch': 0.97}


 32%|███▏      | 39450/121875 [12:07:20<25:12:33,  1.10s/it]

{'loss': 0.7811, 'learning_rate': 3.381538461538461e-05, 'epoch': 0.97}


 32%|███▏      | 39480/121875 [12:07:53<25:11:16,  1.10s/it]

{'loss': 0.7163, 'learning_rate': 3.3803076923076925e-05, 'epoch': 0.97}


 32%|███▏      | 39510/121875 [12:08:28<25:43:23,  1.12s/it]

{'loss': 0.7643, 'learning_rate': 3.379076923076923e-05, 'epoch': 0.97}


 32%|███▏      | 39540/121875 [12:09:01<25:00:43,  1.09s/it]

{'loss': 0.733, 'learning_rate': 3.3778461538461543e-05, 'epoch': 0.97}


 32%|███▏      | 39570/121875 [12:09:34<25:13:31,  1.10s/it]

{'loss': 0.7676, 'learning_rate': 3.376615384615385e-05, 'epoch': 0.97}


 32%|███▏      | 39600/121875 [12:10:07<25:10:12,  1.10s/it]

{'loss': 0.7461, 'learning_rate': 3.3753846153846155e-05, 'epoch': 0.97}


 33%|███▎      | 39630/121875 [12:10:40<25:11:57,  1.10s/it]

{'loss': 0.7452, 'learning_rate': 3.374153846153846e-05, 'epoch': 0.98}


 33%|███▎      | 39660/121875 [12:11:13<25:12:59,  1.10s/it]

{'loss': 0.7667, 'learning_rate': 3.3729230769230766e-05, 'epoch': 0.98}


 33%|███▎      | 39690/121875 [12:11:46<25:05:57,  1.10s/it]

{'loss': 0.7925, 'learning_rate': 3.371692307692308e-05, 'epoch': 0.98}


 33%|███▎      | 39720/121875 [12:12:19<25:04:19,  1.10s/it]

{'loss': 0.8336, 'learning_rate': 3.3704615384615385e-05, 'epoch': 0.98}


 33%|███▎      | 39750/121875 [12:12:52<25:05:32,  1.10s/it]

{'loss': 0.7672, 'learning_rate': 3.36923076923077e-05, 'epoch': 0.98}


 33%|███▎      | 39780/121875 [12:13:25<25:08:06,  1.10s/it]

{'loss': 0.7371, 'learning_rate': 3.368e-05, 'epoch': 0.98}


 33%|███▎      | 39810/121875 [12:13:58<25:00:41,  1.10s/it]

{'loss': 0.7703, 'learning_rate': 3.366769230769231e-05, 'epoch': 0.98}


 33%|███▎      | 39840/121875 [12:14:31<25:05:03,  1.10s/it]

{'loss': 0.7971, 'learning_rate': 3.3655384615384614e-05, 'epoch': 0.98}


 33%|███▎      | 39870/121875 [12:15:04<25:04:52,  1.10s/it]

{'loss': 0.8296, 'learning_rate': 3.364307692307693e-05, 'epoch': 0.98}


 33%|███▎      | 39900/121875 [12:15:37<25:06:23,  1.10s/it]

{'loss': 0.7628, 'learning_rate': 3.363076923076923e-05, 'epoch': 0.98}


 33%|███▎      | 39930/121875 [12:16:10<24:59:53,  1.10s/it]

{'loss': 0.7455, 'learning_rate': 3.361846153846154e-05, 'epoch': 0.98}


 33%|███▎      | 39960/121875 [12:16:43<24:50:59,  1.09s/it]

{'loss': 0.7216, 'learning_rate': 3.360615384615385e-05, 'epoch': 0.98}


 33%|███▎      | 39990/121875 [12:17:16<24:55:27,  1.10s/it]

{'loss': 0.7905, 'learning_rate': 3.3593846153846156e-05, 'epoch': 0.98}


 33%|███▎      | 40020/121875 [12:17:51<25:01:54,  1.10s/it]

{'loss': 0.7139, 'learning_rate': 3.358153846153846e-05, 'epoch': 0.99}


 33%|███▎      | 40050/121875 [12:18:24<25:00:17,  1.10s/it]

{'loss': 0.8226, 'learning_rate': 3.356923076923077e-05, 'epoch': 0.99}


 33%|███▎      | 40080/121875 [12:18:57<24:44:26,  1.09s/it]

{'loss': 0.7225, 'learning_rate': 3.355692307692308e-05, 'epoch': 0.99}


 33%|███▎      | 40110/121875 [12:19:30<24:50:06,  1.09s/it]

{'loss': 0.816, 'learning_rate': 3.3544615384615386e-05, 'epoch': 0.99}


 33%|███▎      | 40140/121875 [12:20:03<25:01:59,  1.10s/it]

{'loss': 0.7552, 'learning_rate': 3.35323076923077e-05, 'epoch': 0.99}


 33%|███▎      | 40170/121875 [12:20:36<24:59:23,  1.10s/it]

{'loss': 0.7275, 'learning_rate': 3.3520000000000004e-05, 'epoch': 0.99}


 33%|███▎      | 40200/121875 [12:21:09<24:59:39,  1.10s/it]

{'loss': 0.6678, 'learning_rate': 3.35076923076923e-05, 'epoch': 0.99}


 33%|███▎      | 40230/121875 [12:21:42<24:58:00,  1.10s/it]

{'loss': 0.7739, 'learning_rate': 3.3495384615384616e-05, 'epoch': 0.99}


 33%|███▎      | 40260/121875 [12:22:15<24:59:29,  1.10s/it]

{'loss': 0.7642, 'learning_rate': 3.348307692307692e-05, 'epoch': 0.99}


 33%|███▎      | 40290/121875 [12:22:48<24:57:13,  1.10s/it]

{'loss': 0.7505, 'learning_rate': 3.3470769230769234e-05, 'epoch': 0.99}


 33%|███▎      | 40320/121875 [12:23:21<24:53:58,  1.10s/it]

{'loss': 0.8581, 'learning_rate': 3.345846153846154e-05, 'epoch': 0.99}


 33%|███▎      | 40350/121875 [12:23:54<24:51:19,  1.10s/it]

{'loss': 0.843, 'learning_rate': 3.344615384615385e-05, 'epoch': 0.99}


 33%|███▎      | 40380/121875 [12:24:27<24:55:45,  1.10s/it]

{'loss': 0.7477, 'learning_rate': 3.343384615384615e-05, 'epoch': 0.99}


 33%|███▎      | 40410/121875 [12:25:00<24:53:59,  1.10s/it]

{'loss': 0.7092, 'learning_rate': 3.3421538461538464e-05, 'epoch': 0.99}


 33%|███▎      | 40440/121875 [12:25:33<24:55:59,  1.10s/it]

{'loss': 0.7816, 'learning_rate': 3.340923076923077e-05, 'epoch': 1.0}


 33%|███▎      | 40470/121875 [12:26:06<25:01:14,  1.11s/it]

{'loss': 0.7587, 'learning_rate': 3.339692307692308e-05, 'epoch': 1.0}


 33%|███▎      | 40500/121875 [12:26:39<24:48:41,  1.10s/it]

{'loss': 0.8114, 'learning_rate': 3.338461538461539e-05, 'epoch': 1.0}


 33%|███▎      | 40530/121875 [12:27:15<24:50:38,  1.10s/it]

{'loss': 0.794, 'learning_rate': 3.337230769230769e-05, 'epoch': 1.0}


 33%|███▎      | 40560/121875 [12:27:48<24:52:13,  1.10s/it]

{'loss': 0.7297, 'learning_rate': 3.336e-05, 'epoch': 1.0}


 33%|███▎      | 40590/121875 [12:28:21<24:43:14,  1.09s/it]

{'loss': 0.8403, 'learning_rate': 3.3347692307692305e-05, 'epoch': 1.0}


 33%|███▎      | 40620/121875 [12:28:54<24:52:19,  1.10s/it]

{'loss': 0.756, 'learning_rate': 3.333538461538462e-05, 'epoch': 1.0}


                                                            
 33%|███▎      | 40625/121875 [12:54:24<24:56:29,  1.11s/it]

{'eval_loss': 0.7547432780265808, 'eval_accuracy': 0.67004, 'eval_runtime': 1524.9202, 'eval_samples_per_second': 32.789, 'eval_steps_per_second': 4.099, 'epoch': 1.0}


 33%|███▎      | 40650/121875 [12:54:53<26:40:37,  1.18s/it]    

{'loss': 0.7317, 'learning_rate': 3.332307692307692e-05, 'epoch': 1.0}


 33%|███▎      | 40680/121875 [12:55:26<24:41:04,  1.09s/it]

{'loss': 0.7176, 'learning_rate': 3.3310769230769236e-05, 'epoch': 1.0}


 33%|███▎      | 40710/121875 [12:55:59<24:49:04,  1.10s/it]

{'loss': 0.7158, 'learning_rate': 3.329846153846154e-05, 'epoch': 1.0}


 33%|███▎      | 40740/121875 [12:56:32<24:43:31,  1.10s/it]

{'loss': 0.7013, 'learning_rate': 3.328615384615385e-05, 'epoch': 1.0}


 33%|███▎      | 40770/121875 [12:57:05<24:45:23,  1.10s/it]

{'loss': 0.7049, 'learning_rate': 3.327384615384615e-05, 'epoch': 1.0}


 33%|███▎      | 40800/121875 [12:57:38<24:45:19,  1.10s/it]

{'loss': 0.7574, 'learning_rate': 3.326153846153846e-05, 'epoch': 1.0}


 34%|███▎      | 40830/121875 [12:58:11<24:47:03,  1.10s/it]

{'loss': 0.6302, 'learning_rate': 3.324923076923077e-05, 'epoch': 1.01}


 34%|███▎      | 40860/121875 [12:58:44<24:53:12,  1.11s/it]

{'loss': 0.68, 'learning_rate': 3.323692307692308e-05, 'epoch': 1.01}


 34%|███▎      | 40890/121875 [12:59:17<24:36:59,  1.09s/it]

{'loss': 0.6763, 'learning_rate': 3.322461538461539e-05, 'epoch': 1.01}


 34%|███▎      | 40920/121875 [12:59:50<24:47:47,  1.10s/it]

{'loss': 0.841, 'learning_rate': 3.3212307692307695e-05, 'epoch': 1.01}


 34%|███▎      | 40950/121875 [13:00:23<24:38:07,  1.10s/it]

{'loss': 0.6614, 'learning_rate': 3.32e-05, 'epoch': 1.01}


 34%|███▎      | 40980/121875 [13:00:56<24:44:33,  1.10s/it]

{'loss': 0.7499, 'learning_rate': 3.3187692307692306e-05, 'epoch': 1.01}


 34%|███▎      | 41010/121875 [13:01:32<25:25:58,  1.13s/it]

{'loss': 0.7599, 'learning_rate': 3.317538461538462e-05, 'epoch': 1.01}


 34%|███▎      | 41040/121875 [13:02:05<24:42:56,  1.10s/it]

{'loss': 0.7293, 'learning_rate': 3.3163076923076925e-05, 'epoch': 1.01}


 34%|███▎      | 41070/121875 [13:02:38<24:42:00,  1.10s/it]

{'loss': 0.7606, 'learning_rate': 3.315076923076923e-05, 'epoch': 1.01}


 34%|███▎      | 41100/121875 [13:03:11<24:40:44,  1.10s/it]

{'loss': 0.7081, 'learning_rate': 3.313846153846154e-05, 'epoch': 1.01}


 34%|███▎      | 41130/121875 [13:03:44<24:43:31,  1.10s/it]

{'loss': 0.6889, 'learning_rate': 3.312615384615385e-05, 'epoch': 1.01}


 34%|███▍      | 41160/121875 [13:04:17<24:46:53,  1.11s/it]

{'loss': 0.6773, 'learning_rate': 3.3113846153846154e-05, 'epoch': 1.01}


 34%|███▍      | 41190/121875 [13:04:50<24:39:08,  1.10s/it]

{'loss': 0.7797, 'learning_rate': 3.310153846153846e-05, 'epoch': 1.01}


 34%|███▍      | 41220/121875 [13:05:22<24:26:06,  1.09s/it]

{'loss': 0.6906, 'learning_rate': 3.308923076923077e-05, 'epoch': 1.01}


 34%|███▍      | 41250/121875 [13:05:55<24:39:09,  1.10s/it]

{'loss': 0.7091, 'learning_rate': 3.307692307692308e-05, 'epoch': 1.02}


 34%|███▍      | 41280/121875 [13:06:28<24:31:29,  1.10s/it]

{'loss': 0.6719, 'learning_rate': 3.306461538461539e-05, 'epoch': 1.02}


 34%|███▍      | 41310/121875 [13:07:01<24:31:34,  1.10s/it]

{'loss': 0.7072, 'learning_rate': 3.3052307692307696e-05, 'epoch': 1.02}


 34%|███▍      | 41340/121875 [13:07:34<24:35:34,  1.10s/it]

{'loss': 0.6679, 'learning_rate': 3.304e-05, 'epoch': 1.02}


 34%|███▍      | 41370/121875 [13:08:07<24:30:52,  1.10s/it]

{'loss': 0.7045, 'learning_rate': 3.302769230769231e-05, 'epoch': 1.02}


 34%|███▍      | 41400/121875 [13:08:40<24:35:37,  1.10s/it]

{'loss': 0.704, 'learning_rate': 3.3015384615384614e-05, 'epoch': 1.02}


 34%|███▍      | 41430/121875 [13:09:13<24:28:31,  1.10s/it]

{'loss': 0.7697, 'learning_rate': 3.3003076923076926e-05, 'epoch': 1.02}


 34%|███▍      | 41460/121875 [13:09:46<24:27:43,  1.10s/it]

{'loss': 0.7447, 'learning_rate': 3.299076923076923e-05, 'epoch': 1.02}


 34%|███▍      | 41490/121875 [13:10:19<24:24:14,  1.09s/it]

{'loss': 0.7015, 'learning_rate': 3.2978461538461544e-05, 'epoch': 1.02}


 34%|███▍      | 41520/121875 [13:10:55<24:39:43,  1.10s/it]

{'loss': 0.7255, 'learning_rate': 3.296615384615384e-05, 'epoch': 1.02}


 34%|███▍      | 41550/121875 [13:11:28<24:31:01,  1.10s/it]

{'loss': 0.7215, 'learning_rate': 3.2953846153846156e-05, 'epoch': 1.02}


 34%|███▍      | 41580/121875 [13:12:01<24:28:36,  1.10s/it]

{'loss': 0.7763, 'learning_rate': 3.294153846153846e-05, 'epoch': 1.02}


 34%|███▍      | 41610/121875 [13:12:34<24:29:36,  1.10s/it]

{'loss': 0.6896, 'learning_rate': 3.2929230769230774e-05, 'epoch': 1.02}


 34%|███▍      | 41640/121875 [13:13:07<24:34:59,  1.10s/it]

{'loss': 0.7052, 'learning_rate': 3.291692307692308e-05, 'epoch': 1.02}


 34%|███▍      | 41670/121875 [13:13:40<24:32:32,  1.10s/it]

{'loss': 0.694, 'learning_rate': 3.2904615384615385e-05, 'epoch': 1.03}


 34%|███▍      | 41700/121875 [13:14:13<24:24:02,  1.10s/it]

{'loss': 0.686, 'learning_rate': 3.289230769230769e-05, 'epoch': 1.03}


 34%|███▍      | 41730/121875 [13:14:46<24:29:34,  1.10s/it]

{'loss': 0.7106, 'learning_rate': 3.288e-05, 'epoch': 1.03}


 34%|███▍      | 41760/121875 [13:15:19<24:30:34,  1.10s/it]

{'loss': 0.689, 'learning_rate': 3.286769230769231e-05, 'epoch': 1.03}


 34%|███▍      | 41790/121875 [13:15:52<24:27:40,  1.10s/it]

{'loss': 0.7126, 'learning_rate': 3.2855384615384615e-05, 'epoch': 1.03}


 34%|███▍      | 41820/121875 [13:16:25<24:36:34,  1.11s/it]

{'loss': 0.6709, 'learning_rate': 3.284307692307693e-05, 'epoch': 1.03}


 34%|███▍      | 41850/121875 [13:16:58<24:28:49,  1.10s/it]

{'loss': 0.7246, 'learning_rate': 3.283076923076923e-05, 'epoch': 1.03}


 34%|███▍      | 41880/121875 [13:17:31<24:29:14,  1.10s/it]

{'loss': 0.6881, 'learning_rate': 3.281846153846154e-05, 'epoch': 1.03}


 34%|███▍      | 41910/121875 [13:18:04<24:27:12,  1.10s/it]

{'loss': 0.7624, 'learning_rate': 3.2806153846153845e-05, 'epoch': 1.03}


 34%|███▍      | 41940/121875 [13:18:37<24:26:35,  1.10s/it]

{'loss': 0.75, 'learning_rate': 3.279384615384615e-05, 'epoch': 1.03}


 34%|███▍      | 41970/121875 [13:19:10<24:16:13,  1.09s/it]

{'loss': 0.7006, 'learning_rate': 3.278153846153846e-05, 'epoch': 1.03}


 34%|███▍      | 42000/121875 [13:19:43<24:28:10,  1.10s/it]

{'loss': 0.7897, 'learning_rate': 3.276923076923077e-05, 'epoch': 1.03}


 34%|███▍      | 42030/121875 [13:20:18<24:22:28,  1.10s/it]

{'loss': 0.6911, 'learning_rate': 3.275692307692308e-05, 'epoch': 1.03}


 35%|███▍      | 42060/121875 [13:20:51<24:22:36,  1.10s/it]

{'loss': 0.7651, 'learning_rate': 3.274461538461539e-05, 'epoch': 1.04}


 35%|███▍      | 42090/121875 [13:21:24<24:26:17,  1.10s/it]

{'loss': 0.7307, 'learning_rate': 3.273230769230769e-05, 'epoch': 1.04}


 35%|███▍      | 42120/121875 [13:21:57<24:25:18,  1.10s/it]

{'loss': 0.7803, 'learning_rate': 3.272e-05, 'epoch': 1.04}


 35%|███▍      | 42150/121875 [13:22:30<24:24:19,  1.10s/it]

{'loss': 0.6803, 'learning_rate': 3.270769230769231e-05, 'epoch': 1.04}


 35%|███▍      | 42180/121875 [13:23:03<24:19:49,  1.10s/it]

{'loss': 0.6891, 'learning_rate': 3.269538461538462e-05, 'epoch': 1.04}


 35%|███▍      | 42210/121875 [13:23:36<24:13:48,  1.09s/it]

{'loss': 0.7152, 'learning_rate': 3.268307692307693e-05, 'epoch': 1.04}


 35%|███▍      | 42240/121875 [13:24:09<24:21:12,  1.10s/it]

{'loss': 0.75, 'learning_rate': 3.2670769230769235e-05, 'epoch': 1.04}


 35%|███▍      | 42270/121875 [13:24:42<24:22:04,  1.10s/it]

{'loss': 0.7835, 'learning_rate': 3.265846153846154e-05, 'epoch': 1.04}


 35%|███▍      | 42300/121875 [13:25:15<24:15:34,  1.10s/it]

{'loss': 0.7055, 'learning_rate': 3.2646153846153846e-05, 'epoch': 1.04}


 35%|███▍      | 42330/121875 [13:25:48<24:19:05,  1.10s/it]

{'loss': 0.7041, 'learning_rate': 3.263384615384615e-05, 'epoch': 1.04}


 35%|███▍      | 42360/121875 [13:26:21<24:16:21,  1.10s/it]

{'loss': 0.7457, 'learning_rate': 3.2621538461538465e-05, 'epoch': 1.04}


 35%|███▍      | 42390/121875 [13:26:54<24:16:23,  1.10s/it]

{'loss': 0.7238, 'learning_rate': 3.260923076923077e-05, 'epoch': 1.04}


 35%|███▍      | 42420/121875 [13:27:27<24:22:39,  1.10s/it]

{'loss': 0.6757, 'learning_rate': 3.259692307692308e-05, 'epoch': 1.04}


 35%|███▍      | 42450/121875 [13:28:00<24:18:31,  1.10s/it]

{'loss': 0.7875, 'learning_rate': 3.258461538461539e-05, 'epoch': 1.04}


 35%|███▍      | 42480/121875 [13:28:33<24:13:35,  1.10s/it]

{'loss': 0.6963, 'learning_rate': 3.2572307692307694e-05, 'epoch': 1.05}


 35%|███▍      | 42510/121875 [13:29:08<24:47:50,  1.12s/it]

{'loss': 0.6914, 'learning_rate': 3.256e-05, 'epoch': 1.05}


 35%|███▍      | 42540/121875 [13:29:41<24:18:40,  1.10s/it]

{'loss': 0.7442, 'learning_rate': 3.2547692307692306e-05, 'epoch': 1.05}


 35%|███▍      | 42570/121875 [13:30:14<24:11:01,  1.10s/it]

{'loss': 0.6897, 'learning_rate': 3.253538461538462e-05, 'epoch': 1.05}


 35%|███▍      | 42600/121875 [13:30:47<24:16:44,  1.10s/it]

{'loss': 0.6904, 'learning_rate': 3.2523076923076924e-05, 'epoch': 1.05}


 35%|███▍      | 42630/121875 [13:31:21<24:09:33,  1.10s/it]

{'loss': 0.6961, 'learning_rate': 3.2510769230769236e-05, 'epoch': 1.05}


 35%|███▌      | 42660/121875 [13:31:54<24:14:24,  1.10s/it]

{'loss': 0.6985, 'learning_rate': 3.2498461538461535e-05, 'epoch': 1.05}


 35%|███▌      | 42690/121875 [13:32:26<24:06:31,  1.10s/it]

{'loss': 0.7204, 'learning_rate': 3.248615384615385e-05, 'epoch': 1.05}


 35%|███▌      | 42720/121875 [13:32:59<24:10:02,  1.10s/it]

{'loss': 0.7147, 'learning_rate': 3.2473846153846154e-05, 'epoch': 1.05}


 35%|███▌      | 42750/121875 [13:33:32<24:15:30,  1.10s/it]

{'loss': 0.6973, 'learning_rate': 3.2461538461538466e-05, 'epoch': 1.05}


 35%|███▌      | 42780/121875 [13:34:06<24:05:34,  1.10s/it]

{'loss': 0.7024, 'learning_rate': 3.244923076923077e-05, 'epoch': 1.05}


 35%|███▌      | 42810/121875 [13:34:39<24:14:44,  1.10s/it]

{'loss': 0.7095, 'learning_rate': 3.243692307692308e-05, 'epoch': 1.05}


 35%|███▌      | 42840/121875 [13:35:12<24:05:02,  1.10s/it]

{'loss': 0.684, 'learning_rate': 3.242461538461538e-05, 'epoch': 1.05}


 35%|███▌      | 42870/121875 [13:35:45<24:09:01,  1.10s/it]

{'loss': 0.6949, 'learning_rate': 3.241230769230769e-05, 'epoch': 1.06}


 35%|███▌      | 42900/121875 [13:36:18<23:58:49,  1.09s/it]

{'loss': 0.7257, 'learning_rate': 3.24e-05, 'epoch': 1.06}


 35%|███▌      | 42930/121875 [13:36:51<24:06:47,  1.10s/it]

{'loss': 0.7491, 'learning_rate': 3.238769230769231e-05, 'epoch': 1.06}


 35%|███▌      | 42960/121875 [13:37:23<24:10:52,  1.10s/it]

{'loss': 0.7576, 'learning_rate': 3.237538461538462e-05, 'epoch': 1.06}


 35%|███▌      | 42990/121875 [13:37:56<23:59:35,  1.09s/it]

{'loss': 0.7704, 'learning_rate': 3.2363076923076925e-05, 'epoch': 1.06}


 35%|███▌      | 43020/121875 [13:38:32<24:00:31,  1.10s/it]

{'loss': 0.757, 'learning_rate': 3.235076923076923e-05, 'epoch': 1.06}


 35%|███▌      | 43050/121875 [13:39:05<24:00:00,  1.10s/it]

{'loss': 0.6845, 'learning_rate': 3.233846153846154e-05, 'epoch': 1.06}


 35%|███▌      | 43080/121875 [13:39:38<24:03:33,  1.10s/it]

{'loss': 0.7401, 'learning_rate': 3.232615384615385e-05, 'epoch': 1.06}


 35%|███▌      | 43110/121875 [13:40:10<24:06:51,  1.10s/it]

{'loss': 0.7221, 'learning_rate': 3.2313846153846155e-05, 'epoch': 1.06}


 35%|███▌      | 43140/121875 [13:40:43<24:10:16,  1.11s/it]

{'loss': 0.7187, 'learning_rate': 3.230153846153846e-05, 'epoch': 1.06}


 35%|███▌      | 43170/121875 [13:41:17<24:01:05,  1.10s/it]

{'loss': 0.7433, 'learning_rate': 3.228923076923077e-05, 'epoch': 1.06}


 35%|███▌      | 43200/121875 [13:41:50<24:05:08,  1.10s/it]

{'loss': 0.6675, 'learning_rate': 3.227692307692308e-05, 'epoch': 1.06}


 35%|███▌      | 43230/121875 [13:42:23<23:57:02,  1.10s/it]

{'loss': 0.7029, 'learning_rate': 3.2264615384615385e-05, 'epoch': 1.06}


 35%|███▌      | 43260/121875 [13:42:56<24:01:27,  1.10s/it]

{'loss': 0.7085, 'learning_rate': 3.225230769230769e-05, 'epoch': 1.06}


 36%|███▌      | 43290/121875 [13:43:29<23:59:10,  1.10s/it]

{'loss': 0.7546, 'learning_rate': 3.224e-05, 'epoch': 1.07}


 36%|███▌      | 43320/121875 [13:44:02<24:02:07,  1.10s/it]

{'loss': 0.673, 'learning_rate': 3.222769230769231e-05, 'epoch': 1.07}


 36%|███▌      | 43350/121875 [13:44:35<23:55:58,  1.10s/it]

{'loss': 0.7321, 'learning_rate': 3.221538461538462e-05, 'epoch': 1.07}


 36%|███▌      | 43380/121875 [13:45:08<23:53:36,  1.10s/it]

{'loss': 0.728, 'learning_rate': 3.220307692307693e-05, 'epoch': 1.07}


 36%|███▌      | 43410/121875 [13:45:41<23:45:58,  1.09s/it]

{'loss': 0.7417, 'learning_rate': 3.219076923076923e-05, 'epoch': 1.07}


 36%|███▌      | 43440/121875 [13:46:14<23:59:18,  1.10s/it]

{'loss': 0.7294, 'learning_rate': 3.217846153846154e-05, 'epoch': 1.07}


 36%|███▌      | 43470/121875 [13:46:47<24:06:29,  1.11s/it]

{'loss': 0.6628, 'learning_rate': 3.2166153846153844e-05, 'epoch': 1.07}


 36%|███▌      | 43500/121875 [13:47:20<23:53:55,  1.10s/it]

{'loss': 0.7887, 'learning_rate': 3.215384615384616e-05, 'epoch': 1.07}


 36%|███▌      | 43530/121875 [13:47:55<23:52:12,  1.10s/it]

{'loss': 0.6426, 'learning_rate': 3.214153846153846e-05, 'epoch': 1.07}


 36%|███▌      | 43560/121875 [13:48:28<23:57:55,  1.10s/it]

{'loss': 0.7724, 'learning_rate': 3.2129230769230775e-05, 'epoch': 1.07}


 36%|███▌      | 43590/121875 [13:49:01<23:55:26,  1.10s/it]

{'loss': 0.7515, 'learning_rate': 3.211692307692308e-05, 'epoch': 1.07}


 36%|███▌      | 43620/121875 [13:49:34<23:56:09,  1.10s/it]

{'loss': 0.669, 'learning_rate': 3.2104615384615386e-05, 'epoch': 1.07}


 36%|███▌      | 43650/121875 [13:50:07<23:57:52,  1.10s/it]

{'loss': 0.7646, 'learning_rate': 3.209230769230769e-05, 'epoch': 1.07}


 36%|███▌      | 43680/121875 [13:50:40<24:03:29,  1.11s/it]

{'loss': 0.7353, 'learning_rate': 3.208e-05, 'epoch': 1.08}


 36%|███▌      | 43710/121875 [13:51:13<23:51:37,  1.10s/it]

{'loss': 0.7044, 'learning_rate': 3.206769230769231e-05, 'epoch': 1.08}


 36%|███▌      | 43740/121875 [13:51:46<23:54:37,  1.10s/it]

{'loss': 0.6617, 'learning_rate': 3.2055384615384616e-05, 'epoch': 1.08}


 36%|███▌      | 43770/121875 [13:52:19<23:51:53,  1.10s/it]

{'loss': 0.7285, 'learning_rate': 3.204307692307693e-05, 'epoch': 1.08}


 36%|███▌      | 43800/121875 [13:52:52<23:56:11,  1.10s/it]

{'loss': 0.7401, 'learning_rate': 3.2030769230769234e-05, 'epoch': 1.08}


 36%|███▌      | 43830/121875 [13:53:25<23:47:15,  1.10s/it]

{'loss': 0.7288, 'learning_rate': 3.201846153846154e-05, 'epoch': 1.08}


 36%|███▌      | 43860/121875 [13:53:58<23:51:57,  1.10s/it]

{'loss': 0.7044, 'learning_rate': 3.2006153846153846e-05, 'epoch': 1.08}


 36%|███▌      | 43890/121875 [13:54:31<23:58:53,  1.11s/it]

{'loss': 0.7204, 'learning_rate': 3.199384615384616e-05, 'epoch': 1.08}


 36%|███▌      | 43920/121875 [13:55:04<23:48:32,  1.10s/it]

{'loss': 0.6676, 'learning_rate': 3.1981538461538464e-05, 'epoch': 1.08}


 36%|███▌      | 43950/121875 [13:55:37<23:41:19,  1.09s/it]

{'loss': 0.7715, 'learning_rate': 3.1969230769230776e-05, 'epoch': 1.08}


 36%|███▌      | 43980/121875 [13:56:10<23:48:48,  1.10s/it]

{'loss': 0.7165, 'learning_rate': 3.1956923076923075e-05, 'epoch': 1.08}


 36%|███▌      | 44010/121875 [13:56:45<24:24:31,  1.13s/it]

{'loss': 0.658, 'learning_rate': 3.194461538461538e-05, 'epoch': 1.08}


 36%|███▌      | 44040/121875 [13:57:18<23:43:26,  1.10s/it]

{'loss': 0.7085, 'learning_rate': 3.1932307692307694e-05, 'epoch': 1.08}


 36%|███▌      | 44070/121875 [13:57:51<23:42:53,  1.10s/it]

{'loss': 0.7011, 'learning_rate': 3.192e-05, 'epoch': 1.08}


 36%|███▌      | 44100/121875 [13:58:24<23:44:22,  1.10s/it]

{'loss': 0.738, 'learning_rate': 3.190769230769231e-05, 'epoch': 1.09}


 36%|███▌      | 44130/121875 [13:58:57<23:50:34,  1.10s/it]

{'loss': 0.7234, 'learning_rate': 3.189538461538462e-05, 'epoch': 1.09}


 36%|███▌      | 44160/121875 [13:59:30<23:46:44,  1.10s/it]

{'loss': 0.7201, 'learning_rate': 3.188307692307692e-05, 'epoch': 1.09}


 36%|███▋      | 44190/121875 [14:00:03<24:22:49,  1.13s/it]

{'loss': 0.7782, 'learning_rate': 3.187076923076923e-05, 'epoch': 1.09}


 36%|███▋      | 44220/121875 [14:00:36<23:40:17,  1.10s/it]

{'loss': 0.76, 'learning_rate': 3.185846153846154e-05, 'epoch': 1.09}


 36%|███▋      | 44250/121875 [14:01:09<23:41:58,  1.10s/it]

{'loss': 0.703, 'learning_rate': 3.184615384615385e-05, 'epoch': 1.09}


 36%|███▋      | 44280/121875 [14:01:42<23:44:57,  1.10s/it]

{'loss': 0.6827, 'learning_rate': 3.183384615384615e-05, 'epoch': 1.09}


 36%|███▋      | 44310/121875 [14:02:15<23:39:39,  1.10s/it]

{'loss': 0.6868, 'learning_rate': 3.1821538461538465e-05, 'epoch': 1.09}


 36%|███▋      | 44340/121875 [14:02:48<23:39:10,  1.10s/it]

{'loss': 0.7765, 'learning_rate': 3.180923076923077e-05, 'epoch': 1.09}


 36%|███▋      | 44370/121875 [14:03:21<23:43:45,  1.10s/it]

{'loss': 0.7657, 'learning_rate': 3.179692307692308e-05, 'epoch': 1.09}


 36%|███▋      | 44400/121875 [14:03:54<23:38:02,  1.10s/it]

{'loss': 0.6636, 'learning_rate': 3.178461538461538e-05, 'epoch': 1.09}


 36%|███▋      | 44430/121875 [14:04:27<23:43:54,  1.10s/it]

{'loss': 0.6651, 'learning_rate': 3.1772307692307695e-05, 'epoch': 1.09}


 36%|███▋      | 44460/121875 [14:05:00<23:36:31,  1.10s/it]

{'loss': 0.6354, 'learning_rate': 3.176e-05, 'epoch': 1.09}


 37%|███▋      | 44490/121875 [14:05:33<23:41:12,  1.10s/it]

{'loss': 0.7285, 'learning_rate': 3.174769230769231e-05, 'epoch': 1.1}


 37%|███▋      | 44520/121875 [14:06:08<23:37:59,  1.10s/it]

{'loss': 0.7378, 'learning_rate': 3.173538461538462e-05, 'epoch': 1.1}


 37%|███▋      | 44550/121875 [14:06:41<23:41:35,  1.10s/it]

{'loss': 0.7594, 'learning_rate': 3.1723076923076925e-05, 'epoch': 1.1}


 37%|███▋      | 44580/121875 [14:07:14<23:37:27,  1.10s/it]

{'loss': 0.6583, 'learning_rate': 3.171076923076923e-05, 'epoch': 1.1}


 37%|███▋      | 44610/121875 [14:07:47<23:34:54,  1.10s/it]

{'loss': 0.6738, 'learning_rate': 3.1698461538461536e-05, 'epoch': 1.1}


 37%|███▋      | 44640/121875 [14:08:20<23:39:37,  1.10s/it]

{'loss': 0.7731, 'learning_rate': 3.168615384615385e-05, 'epoch': 1.1}


 37%|███▋      | 44670/121875 [14:08:53<23:40:27,  1.10s/it]

{'loss': 0.7073, 'learning_rate': 3.1673846153846154e-05, 'epoch': 1.1}


 37%|███▋      | 44700/121875 [14:09:26<23:35:21,  1.10s/it]

{'loss': 0.7018, 'learning_rate': 3.166153846153847e-05, 'epoch': 1.1}


 37%|███▋      | 44730/121875 [14:10:00<23:43:05,  1.11s/it]

{'loss': 0.762, 'learning_rate': 3.164923076923077e-05, 'epoch': 1.1}


 37%|███▋      | 44760/121875 [14:10:33<23:31:12,  1.10s/it]

{'loss': 0.6177, 'learning_rate': 3.163692307692308e-05, 'epoch': 1.1}


 37%|███▋      | 44790/121875 [14:11:06<23:27:37,  1.10s/it]

{'loss': 0.7592, 'learning_rate': 3.1624615384615384e-05, 'epoch': 1.1}


 37%|███▋      | 44820/121875 [14:11:38<23:30:37,  1.10s/it]

{'loss': 0.7698, 'learning_rate': 3.161230769230769e-05, 'epoch': 1.1}


 37%|███▋      | 44850/121875 [14:12:12<23:26:40,  1.10s/it]

{'loss': 0.7022, 'learning_rate': 3.16e-05, 'epoch': 1.1}


 37%|███▋      | 44880/121875 [14:12:44<23:32:36,  1.10s/it]

{'loss': 0.7012, 'learning_rate': 3.158769230769231e-05, 'epoch': 1.1}


 37%|███▋      | 44910/121875 [14:13:18<23:35:06,  1.10s/it]

{'loss': 0.7054, 'learning_rate': 3.157538461538462e-05, 'epoch': 1.11}


 37%|███▋      | 44940/121875 [14:13:50<23:28:48,  1.10s/it]

{'loss': 0.7014, 'learning_rate': 3.1563076923076926e-05, 'epoch': 1.11}


 37%|███▋      | 44970/121875 [14:14:23<23:18:22,  1.09s/it]

{'loss': 0.7823, 'learning_rate': 3.155076923076923e-05, 'epoch': 1.11}


 37%|███▋      | 45000/121875 [14:14:56<23:30:35,  1.10s/it]

{'loss': 0.6905, 'learning_rate': 3.153846153846154e-05, 'epoch': 1.11}


 37%|███▋      | 45030/121875 [14:15:32<23:34:35,  1.10s/it]

{'loss': 0.6964, 'learning_rate': 3.152615384615385e-05, 'epoch': 1.11}


 37%|███▋      | 45060/121875 [14:16:05<23:29:22,  1.10s/it]

{'loss': 0.7288, 'learning_rate': 3.1513846153846156e-05, 'epoch': 1.11}


 37%|███▋      | 45090/121875 [14:16:38<23:27:45,  1.10s/it]

{'loss': 0.7509, 'learning_rate': 3.150153846153847e-05, 'epoch': 1.11}


 37%|███▋      | 45120/121875 [14:17:11<23:29:58,  1.10s/it]

{'loss': 0.7643, 'learning_rate': 3.1489230769230774e-05, 'epoch': 1.11}


 37%|███▋      | 45150/121875 [14:17:44<23:34:06,  1.11s/it]

{'loss': 0.7123, 'learning_rate': 3.147692307692307e-05, 'epoch': 1.11}


 37%|███▋      | 45180/121875 [14:18:17<23:28:51,  1.10s/it]

{'loss': 0.6916, 'learning_rate': 3.1464615384615386e-05, 'epoch': 1.11}


 37%|███▋      | 45210/121875 [14:18:50<23:23:49,  1.10s/it]

{'loss': 0.6859, 'learning_rate': 3.145230769230769e-05, 'epoch': 1.11}


 37%|███▋      | 45240/121875 [14:19:23<23:24:13,  1.10s/it]

{'loss': 0.5978, 'learning_rate': 3.1440000000000004e-05, 'epoch': 1.11}


 37%|███▋      | 45270/121875 [14:19:56<23:20:05,  1.10s/it]

{'loss': 0.7834, 'learning_rate': 3.142769230769231e-05, 'epoch': 1.11}


 37%|███▋      | 45300/121875 [14:20:29<23:20:05,  1.10s/it]

{'loss': 0.6373, 'learning_rate': 3.1415384615384615e-05, 'epoch': 1.12}


 37%|███▋      | 45330/121875 [14:21:02<23:19:38,  1.10s/it]

{'loss': 0.7566, 'learning_rate': 3.140307692307692e-05, 'epoch': 1.12}


 37%|███▋      | 45360/121875 [14:21:35<23:24:54,  1.10s/it]

{'loss': 0.7326, 'learning_rate': 3.1390769230769234e-05, 'epoch': 1.12}


 37%|███▋      | 45390/121875 [14:22:08<23:27:55,  1.10s/it]

{'loss': 0.7838, 'learning_rate': 3.137846153846154e-05, 'epoch': 1.12}


 37%|███▋      | 45420/121875 [14:22:41<23:13:39,  1.09s/it]

{'loss': 0.6911, 'learning_rate': 3.1366153846153845e-05, 'epoch': 1.12}


 37%|███▋      | 45450/121875 [14:23:14<23:26:53,  1.10s/it]

{'loss': 0.6872, 'learning_rate': 3.135384615384616e-05, 'epoch': 1.12}


 37%|███▋      | 45480/121875 [14:23:47<23:28:27,  1.11s/it]

{'loss': 0.7394, 'learning_rate': 3.134153846153846e-05, 'epoch': 1.12}


 37%|███▋      | 45510/121875 [14:24:22<23:57:53,  1.13s/it]

{'loss': 0.6698, 'learning_rate': 3.132923076923077e-05, 'epoch': 1.12}


 37%|███▋      | 45540/121875 [14:24:55<23:24:42,  1.10s/it]

{'loss': 0.7529, 'learning_rate': 3.1316923076923075e-05, 'epoch': 1.12}


 37%|███▋      | 45570/121875 [14:25:28<23:11:47,  1.09s/it]

{'loss': 0.7677, 'learning_rate': 3.130461538461539e-05, 'epoch': 1.12}


 37%|███▋      | 45600/121875 [14:26:01<23:16:12,  1.10s/it]

{'loss': 0.7083, 'learning_rate': 3.129230769230769e-05, 'epoch': 1.12}


 37%|███▋      | 45630/121875 [14:26:34<23:20:57,  1.10s/it]

{'loss': 0.7736, 'learning_rate': 3.1280000000000005e-05, 'epoch': 1.12}


 37%|███▋      | 45660/121875 [14:27:07<23:18:08,  1.10s/it]

{'loss': 0.7264, 'learning_rate': 3.126769230769231e-05, 'epoch': 1.12}


 37%|███▋      | 45690/121875 [14:27:41<23:17:06,  1.10s/it]

{'loss': 0.7597, 'learning_rate': 3.125538461538462e-05, 'epoch': 1.12}


 38%|███▊      | 45720/121875 [14:28:14<23:19:23,  1.10s/it]

{'loss': 0.6948, 'learning_rate': 3.124307692307692e-05, 'epoch': 1.13}


 38%|███▊      | 45750/121875 [14:28:47<23:16:36,  1.10s/it]

{'loss': 0.6969, 'learning_rate': 3.123076923076923e-05, 'epoch': 1.13}


 38%|███▊      | 45780/121875 [14:29:20<23:22:40,  1.11s/it]

{'loss': 0.6795, 'learning_rate': 3.121846153846154e-05, 'epoch': 1.13}


 38%|███▊      | 45810/121875 [14:29:53<23:15:24,  1.10s/it]

{'loss': 0.7996, 'learning_rate': 3.1206153846153847e-05, 'epoch': 1.13}


 38%|███▊      | 45840/121875 [14:30:26<23:18:05,  1.10s/it]

{'loss': 0.7276, 'learning_rate': 3.119384615384616e-05, 'epoch': 1.13}


 38%|███▊      | 45870/121875 [14:30:59<23:07:24,  1.10s/it]

{'loss': 0.7525, 'learning_rate': 3.1181538461538465e-05, 'epoch': 1.13}


 38%|███▊      | 45900/121875 [14:31:32<23:14:49,  1.10s/it]

{'loss': 0.7431, 'learning_rate': 3.116923076923077e-05, 'epoch': 1.13}


 38%|███▊      | 45930/121875 [14:32:05<23:13:50,  1.10s/it]

{'loss': 0.7079, 'learning_rate': 3.1156923076923076e-05, 'epoch': 1.13}


 38%|███▊      | 45960/121875 [14:32:38<23:09:15,  1.10s/it]

{'loss': 0.7602, 'learning_rate': 3.114461538461539e-05, 'epoch': 1.13}


 38%|███▊      | 45990/121875 [14:33:11<23:09:18,  1.10s/it]

{'loss': 0.7285, 'learning_rate': 3.1132307692307694e-05, 'epoch': 1.13}


 38%|███▊      | 46020/121875 [14:33:46<23:13:55,  1.10s/it]

{'loss': 0.7501, 'learning_rate': 3.112e-05, 'epoch': 1.13}


 38%|███▊      | 46050/121875 [14:34:19<23:08:01,  1.10s/it]

{'loss': 0.7596, 'learning_rate': 3.110769230769231e-05, 'epoch': 1.13}


 38%|███▊      | 46080/121875 [14:34:52<23:04:50,  1.10s/it]

{'loss': 0.6866, 'learning_rate': 3.109538461538462e-05, 'epoch': 1.13}


 38%|███▊      | 46110/121875 [14:35:25<23:09:15,  1.10s/it]

{'loss': 0.8142, 'learning_rate': 3.1083076923076924e-05, 'epoch': 1.14}


 38%|███▊      | 46140/121875 [14:35:58<23:04:05,  1.10s/it]

{'loss': 0.6829, 'learning_rate': 3.107076923076923e-05, 'epoch': 1.14}


 38%|███▊      | 46170/121875 [14:36:31<23:01:41,  1.10s/it]

{'loss': 0.7291, 'learning_rate': 3.105846153846154e-05, 'epoch': 1.14}


 38%|███▊      | 46200/121875 [14:37:04<23:00:03,  1.09s/it]

{'loss': 0.7209, 'learning_rate': 3.104615384615385e-05, 'epoch': 1.14}


 38%|███▊      | 46230/121875 [14:37:37<23:09:23,  1.10s/it]

{'loss': 0.72, 'learning_rate': 3.103384615384616e-05, 'epoch': 1.14}


 38%|███▊      | 46260/121875 [14:38:10<23:02:37,  1.10s/it]

{'loss': 0.7107, 'learning_rate': 3.1021538461538466e-05, 'epoch': 1.14}


 38%|███▊      | 46290/121875 [14:38:43<23:06:28,  1.10s/it]

{'loss': 0.7282, 'learning_rate': 3.1009230769230765e-05, 'epoch': 1.14}


 38%|███▊      | 46320/121875 [14:39:16<23:00:13,  1.10s/it]

{'loss': 0.6656, 'learning_rate': 3.099692307692308e-05, 'epoch': 1.14}


 38%|███▊      | 46350/121875 [14:39:49<23:03:57,  1.10s/it]

{'loss': 0.7283, 'learning_rate': 3.0984615384615384e-05, 'epoch': 1.14}


 38%|███▊      | 46380/121875 [14:40:22<23:01:47,  1.10s/it]

{'loss': 0.743, 'learning_rate': 3.0972307692307696e-05, 'epoch': 1.14}


 38%|███▊      | 46410/121875 [14:40:55<23:02:25,  1.10s/it]

{'loss': 0.6679, 'learning_rate': 3.096e-05, 'epoch': 1.14}


 38%|███▊      | 46440/121875 [14:41:28<22:59:10,  1.10s/it]

{'loss': 0.6753, 'learning_rate': 3.0947692307692314e-05, 'epoch': 1.14}


 38%|███▊      | 46470/121875 [14:42:01<22:58:39,  1.10s/it]

{'loss': 0.7276, 'learning_rate': 3.093538461538461e-05, 'epoch': 1.14}


 38%|███▊      | 46500/121875 [14:42:34<22:54:42,  1.09s/it]

{'loss': 0.7244, 'learning_rate': 3.0923076923076926e-05, 'epoch': 1.14}


 38%|███▊      | 46530/121875 [14:43:09<22:59:37,  1.10s/it]

{'loss': 0.7056, 'learning_rate': 3.091076923076923e-05, 'epoch': 1.15}


 38%|███▊      | 46560/121875 [14:43:42<22:57:26,  1.10s/it]

{'loss': 0.7359, 'learning_rate': 3.089846153846154e-05, 'epoch': 1.15}


 38%|███▊      | 46590/121875 [14:44:15<23:00:27,  1.10s/it]

{'loss': 0.7273, 'learning_rate': 3.088615384615385e-05, 'epoch': 1.15}


 38%|███▊      | 46620/121875 [14:44:48<22:56:30,  1.10s/it]

{'loss': 0.6763, 'learning_rate': 3.0873846153846155e-05, 'epoch': 1.15}


 38%|███▊      | 46650/121875 [14:45:21<23:06:02,  1.11s/it]

{'loss': 0.7088, 'learning_rate': 3.086153846153846e-05, 'epoch': 1.15}


 38%|███▊      | 46680/121875 [14:45:54<23:03:07,  1.10s/it]

{'loss': 0.6979, 'learning_rate': 3.084923076923077e-05, 'epoch': 1.15}


 38%|███▊      | 46710/121875 [14:46:27<22:52:23,  1.10s/it]

{'loss': 0.7037, 'learning_rate': 3.083692307692308e-05, 'epoch': 1.15}


 38%|███▊      | 46740/121875 [14:47:00<22:52:30,  1.10s/it]

{'loss': 0.7138, 'learning_rate': 3.0824615384615385e-05, 'epoch': 1.15}


 38%|███▊      | 46770/121875 [14:47:33<23:00:45,  1.10s/it]

{'loss': 0.7101, 'learning_rate': 3.08123076923077e-05, 'epoch': 1.15}


 38%|███▊      | 46800/121875 [14:48:06<22:46:14,  1.09s/it]

{'loss': 0.75, 'learning_rate': 3.08e-05, 'epoch': 1.15}


 38%|███▊      | 46830/121875 [14:48:39<22:57:03,  1.10s/it]

{'loss': 0.7099, 'learning_rate': 3.078769230769231e-05, 'epoch': 1.15}


 38%|███▊      | 46860/121875 [14:49:12<22:54:28,  1.10s/it]

{'loss': 0.696, 'learning_rate': 3.0775384615384615e-05, 'epoch': 1.15}


 38%|███▊      | 46890/121875 [14:49:45<22:51:33,  1.10s/it]

{'loss': 0.6689, 'learning_rate': 3.076307692307692e-05, 'epoch': 1.15}


 38%|███▊      | 46920/121875 [14:50:18<22:42:16,  1.09s/it]

{'loss': 0.7054, 'learning_rate': 3.075076923076923e-05, 'epoch': 1.15}


 39%|███▊      | 46950/121875 [14:50:51<22:46:46,  1.09s/it]

{'loss': 0.7893, 'learning_rate': 3.073846153846154e-05, 'epoch': 1.16}


 39%|███▊      | 46980/121875 [14:51:24<22:57:37,  1.10s/it]

{'loss': 0.7451, 'learning_rate': 3.072615384615385e-05, 'epoch': 1.16}


 39%|███▊      | 47010/121875 [14:51:59<23:22:43,  1.12s/it]

{'loss': 0.7681, 'learning_rate': 3.071384615384616e-05, 'epoch': 1.16}


 39%|███▊      | 47040/121875 [14:52:32<23:00:52,  1.11s/it]

{'loss': 0.6736, 'learning_rate': 3.070153846153846e-05, 'epoch': 1.16}


 39%|███▊      | 47070/121875 [14:53:05<22:47:35,  1.10s/it]

{'loss': 0.7183, 'learning_rate': 3.068923076923077e-05, 'epoch': 1.16}


 39%|███▊      | 47100/121875 [14:53:38<22:48:55,  1.10s/it]

{'loss': 0.7061, 'learning_rate': 3.067692307692308e-05, 'epoch': 1.16}


 39%|███▊      | 47130/121875 [14:54:11<22:39:54,  1.09s/it]

{'loss': 0.6891, 'learning_rate': 3.0664615384615387e-05, 'epoch': 1.16}


 39%|███▊      | 47160/121875 [14:54:44<22:52:03,  1.10s/it]

{'loss': 0.7403, 'learning_rate': 3.065230769230769e-05, 'epoch': 1.16}


 39%|███▊      | 47190/121875 [14:55:17<22:47:07,  1.10s/it]

{'loss': 0.7154, 'learning_rate': 3.0640000000000005e-05, 'epoch': 1.16}


 39%|███▊      | 47220/121875 [14:55:50<22:53:00,  1.10s/it]

{'loss': 0.6586, 'learning_rate': 3.062769230769231e-05, 'epoch': 1.16}


 39%|███▉      | 47250/121875 [14:56:23<22:44:57,  1.10s/it]

{'loss': 0.703, 'learning_rate': 3.0615384615384616e-05, 'epoch': 1.16}


 39%|███▉      | 47280/121875 [14:56:56<22:48:01,  1.10s/it]

{'loss': 0.6692, 'learning_rate': 3.060307692307692e-05, 'epoch': 1.16}


 39%|███▉      | 47310/121875 [14:57:29<22:50:41,  1.10s/it]

{'loss': 0.7516, 'learning_rate': 3.0590769230769234e-05, 'epoch': 1.16}


 39%|███▉      | 47340/121875 [14:58:02<22:42:23,  1.10s/it]

{'loss': 0.7071, 'learning_rate': 3.057846153846154e-05, 'epoch': 1.17}


 39%|███▉      | 47370/121875 [14:58:35<22:45:29,  1.10s/it]

{'loss': 0.7805, 'learning_rate': 3.056615384615385e-05, 'epoch': 1.17}


 39%|███▉      | 47400/121875 [14:59:08<22:50:50,  1.10s/it]

{'loss': 0.7115, 'learning_rate': 3.055384615384616e-05, 'epoch': 1.17}


 39%|███▉      | 47430/121875 [14:59:41<22:49:12,  1.10s/it]

{'loss': 0.7275, 'learning_rate': 3.054153846153846e-05, 'epoch': 1.17}


 39%|███▉      | 47460/121875 [15:00:14<22:45:44,  1.10s/it]

{'loss': 0.7157, 'learning_rate': 3.052923076923077e-05, 'epoch': 1.17}


 39%|███▉      | 47490/121875 [15:00:47<22:49:37,  1.10s/it]

{'loss': 0.73, 'learning_rate': 3.0516923076923076e-05, 'epoch': 1.17}


 39%|███▉      | 47520/121875 [15:01:22<22:42:39,  1.10s/it]

{'loss': 0.7384, 'learning_rate': 3.0504615384615388e-05, 'epoch': 1.17}


 39%|███▉      | 47550/121875 [15:01:55<22:41:15,  1.10s/it]

{'loss': 0.7294, 'learning_rate': 3.0492307692307694e-05, 'epoch': 1.17}


 39%|███▉      | 47580/121875 [15:02:28<22:49:00,  1.11s/it]

{'loss': 0.7493, 'learning_rate': 3.0480000000000003e-05, 'epoch': 1.17}


 39%|███▉      | 47610/121875 [15:03:01<22:42:44,  1.10s/it]

{'loss': 0.7227, 'learning_rate': 3.046769230769231e-05, 'epoch': 1.17}


 39%|███▉      | 47640/121875 [15:03:34<22:40:36,  1.10s/it]

{'loss': 0.7249, 'learning_rate': 3.0455384615384618e-05, 'epoch': 1.17}


 39%|███▉      | 47670/121875 [15:04:07<22:39:35,  1.10s/it]

{'loss': 0.6784, 'learning_rate': 3.0443076923076924e-05, 'epoch': 1.17}


 39%|███▉      | 47700/121875 [15:04:40<22:39:51,  1.10s/it]

{'loss': 0.7655, 'learning_rate': 3.0430769230769236e-05, 'epoch': 1.17}


 39%|███▉      | 47730/121875 [15:05:13<22:41:53,  1.10s/it]

{'loss': 0.7039, 'learning_rate': 3.0418461538461542e-05, 'epoch': 1.17}


 39%|███▉      | 47760/121875 [15:05:46<22:39:22,  1.10s/it]

{'loss': 0.7511, 'learning_rate': 3.0406153846153844e-05, 'epoch': 1.18}


 39%|███▉      | 47790/121875 [15:06:19<22:36:48,  1.10s/it]

{'loss': 0.7225, 'learning_rate': 3.0393846153846157e-05, 'epoch': 1.18}


 39%|███▉      | 47820/121875 [15:06:52<22:36:42,  1.10s/it]

{'loss': 0.636, 'learning_rate': 3.0381538461538462e-05, 'epoch': 1.18}


 39%|███▉      | 47850/121875 [15:07:25<22:41:27,  1.10s/it]

{'loss': 0.733, 'learning_rate': 3.036923076923077e-05, 'epoch': 1.18}


 39%|███▉      | 47880/121875 [15:07:58<22:36:46,  1.10s/it]

{'loss': 0.7084, 'learning_rate': 3.0356923076923077e-05, 'epoch': 1.18}


 39%|███▉      | 47910/121875 [15:08:31<22:29:27,  1.09s/it]

{'loss': 0.7044, 'learning_rate': 3.034461538461539e-05, 'epoch': 1.18}


 39%|███▉      | 47940/121875 [15:09:04<22:33:05,  1.10s/it]

{'loss': 0.6995, 'learning_rate': 3.0332307692307692e-05, 'epoch': 1.18}


 39%|███▉      | 47970/121875 [15:09:37<22:32:05,  1.10s/it]

{'loss': 0.7369, 'learning_rate': 3.0320000000000004e-05, 'epoch': 1.18}


 39%|███▉      | 48000/121875 [15:10:10<22:28:17,  1.10s/it]

{'loss': 0.6315, 'learning_rate': 3.030769230769231e-05, 'epoch': 1.18}


 39%|███▉      | 48030/121875 [15:10:45<22:33:41,  1.10s/it]

{'loss': 0.7101, 'learning_rate': 3.0295384615384616e-05, 'epoch': 1.18}


 39%|███▉      | 48060/121875 [15:11:18<22:27:39,  1.10s/it]

{'loss': 0.759, 'learning_rate': 3.0283076923076925e-05, 'epoch': 1.18}


 39%|███▉      | 48090/121875 [15:11:51<22:31:24,  1.10s/it]

{'loss': 0.7354, 'learning_rate': 3.027076923076923e-05, 'epoch': 1.18}


 39%|███▉      | 48120/121875 [15:12:24<22:35:19,  1.10s/it]

{'loss': 0.7191, 'learning_rate': 3.025846153846154e-05, 'epoch': 1.18}


 40%|███▉      | 48150/121875 [15:12:57<22:29:44,  1.10s/it]

{'loss': 0.7131, 'learning_rate': 3.0246153846153846e-05, 'epoch': 1.19}


 40%|███▉      | 48180/121875 [15:13:30<22:32:30,  1.10s/it]

{'loss': 0.6907, 'learning_rate': 3.0233846153846158e-05, 'epoch': 1.19}


 40%|███▉      | 48210/121875 [15:14:03<22:35:00,  1.10s/it]

{'loss': 0.7592, 'learning_rate': 3.0221538461538464e-05, 'epoch': 1.19}


 40%|███▉      | 48240/121875 [15:14:36<22:32:02,  1.10s/it]

{'loss': 0.7251, 'learning_rate': 3.0209230769230773e-05, 'epoch': 1.19}


 40%|███▉      | 48270/121875 [15:15:09<22:32:35,  1.10s/it]

{'loss': 0.6733, 'learning_rate': 3.019692307692308e-05, 'epoch': 1.19}


 40%|███▉      | 48300/121875 [15:15:42<22:31:18,  1.10s/it]

{'loss': 0.7335, 'learning_rate': 3.0184615384615384e-05, 'epoch': 1.19}


 40%|███▉      | 48330/121875 [15:16:15<22:32:03,  1.10s/it]

{'loss': 0.7047, 'learning_rate': 3.0172307692307694e-05, 'epoch': 1.19}


 40%|███▉      | 48360/121875 [15:16:48<22:20:55,  1.09s/it]

{'loss': 0.6949, 'learning_rate': 3.016e-05, 'epoch': 1.19}


 40%|███▉      | 48390/121875 [15:17:21<22:21:27,  1.10s/it]

{'loss': 0.5983, 'learning_rate': 3.0147692307692312e-05, 'epoch': 1.19}


 40%|███▉      | 48420/121875 [15:17:54<22:28:59,  1.10s/it]

{'loss': 0.6482, 'learning_rate': 3.0135384615384614e-05, 'epoch': 1.19}


 40%|███▉      | 48450/121875 [15:18:27<22:32:09,  1.10s/it]

{'loss': 0.7386, 'learning_rate': 3.0123076923076927e-05, 'epoch': 1.19}


 40%|███▉      | 48480/121875 [15:19:00<22:23:00,  1.10s/it]

{'loss': 0.7066, 'learning_rate': 3.0110769230769232e-05, 'epoch': 1.19}


 40%|███▉      | 48510/121875 [15:19:35<22:57:29,  1.13s/it]

{'loss': 0.7423, 'learning_rate': 3.009846153846154e-05, 'epoch': 1.19}


 40%|███▉      | 48540/121875 [15:20:08<22:24:02,  1.10s/it]

{'loss': 0.7127, 'learning_rate': 3.0086153846153847e-05, 'epoch': 1.19}


 40%|███▉      | 48570/121875 [15:20:41<22:24:15,  1.10s/it]

{'loss': 0.6888, 'learning_rate': 3.007384615384616e-05, 'epoch': 1.2}


 40%|███▉      | 48600/121875 [15:21:14<22:23:35,  1.10s/it]

{'loss': 0.7303, 'learning_rate': 3.0061538461538462e-05, 'epoch': 1.2}


 40%|███▉      | 48630/121875 [15:21:47<22:20:30,  1.10s/it]

{'loss': 0.7624, 'learning_rate': 3.0049230769230768e-05, 'epoch': 1.2}


 40%|███▉      | 48660/121875 [15:22:20<22:26:07,  1.10s/it]

{'loss': 0.7239, 'learning_rate': 3.003692307692308e-05, 'epoch': 1.2}


 40%|███▉      | 48690/121875 [15:22:53<22:22:00,  1.10s/it]

{'loss': 0.8307, 'learning_rate': 3.0024615384615386e-05, 'epoch': 1.2}


 40%|███▉      | 48720/121875 [15:23:27<22:19:47,  1.10s/it]

{'loss': 0.7207, 'learning_rate': 3.0012307692307695e-05, 'epoch': 1.2}


 40%|████      | 48750/121875 [15:23:59<22:12:02,  1.09s/it]

{'loss': 0.744, 'learning_rate': 3e-05, 'epoch': 1.2}


 40%|████      | 48780/121875 [15:24:32<22:18:23,  1.10s/it]

{'loss': 0.7042, 'learning_rate': 2.998769230769231e-05, 'epoch': 1.2}


 40%|████      | 48810/121875 [15:25:06<22:18:13,  1.10s/it]

{'loss': 0.7048, 'learning_rate': 2.9975384615384616e-05, 'epoch': 1.2}


 40%|████      | 48840/121875 [15:25:38<22:17:54,  1.10s/it]

{'loss': 0.6494, 'learning_rate': 2.9963076923076928e-05, 'epoch': 1.2}


 40%|████      | 48870/121875 [15:26:12<22:17:53,  1.10s/it]

{'loss': 0.7272, 'learning_rate': 2.9950769230769234e-05, 'epoch': 1.2}


 40%|████      | 48900/121875 [15:26:45<22:17:38,  1.10s/it]

{'loss': 0.725, 'learning_rate': 2.9938461538461536e-05, 'epoch': 1.2}


 40%|████      | 48930/121875 [15:27:18<22:08:00,  1.09s/it]

{'loss': 0.7339, 'learning_rate': 2.992615384615385e-05, 'epoch': 1.2}


 40%|████      | 48960/121875 [15:27:50<22:20:59,  1.10s/it]

{'loss': 0.731, 'learning_rate': 2.9913846153846154e-05, 'epoch': 1.21}


 40%|████      | 48990/121875 [15:28:24<22:17:10,  1.10s/it]

{'loss': 0.6756, 'learning_rate': 2.9901538461538464e-05, 'epoch': 1.21}


 40%|████      | 49020/121875 [15:28:59<22:18:03,  1.10s/it]

{'loss': 0.6814, 'learning_rate': 2.988923076923077e-05, 'epoch': 1.21}


 40%|████      | 49050/121875 [15:29:32<22:13:50,  1.10s/it]

{'loss': 0.7452, 'learning_rate': 2.9876923076923082e-05, 'epoch': 1.21}


 40%|████      | 49080/121875 [15:30:05<22:08:20,  1.09s/it]

{'loss': 0.6913, 'learning_rate': 2.9864615384615384e-05, 'epoch': 1.21}


 40%|████      | 49110/121875 [15:30:38<22:12:35,  1.10s/it]

{'loss': 0.7135, 'learning_rate': 2.9852307692307697e-05, 'epoch': 1.21}


 40%|████      | 49140/121875 [15:31:11<22:16:12,  1.10s/it]

{'loss': 0.699, 'learning_rate': 2.9840000000000002e-05, 'epoch': 1.21}


 40%|████      | 49170/121875 [15:31:44<22:11:27,  1.10s/it]

{'loss': 0.7381, 'learning_rate': 2.9827692307692308e-05, 'epoch': 1.21}


 40%|████      | 49200/121875 [15:32:17<22:14:09,  1.10s/it]

{'loss': 0.728, 'learning_rate': 2.9815384615384617e-05, 'epoch': 1.21}


 40%|████      | 49230/121875 [15:32:50<22:14:33,  1.10s/it]

{'loss': 0.7272, 'learning_rate': 2.9803076923076923e-05, 'epoch': 1.21}


 40%|████      | 49260/121875 [15:33:23<22:09:51,  1.10s/it]

{'loss': 0.6635, 'learning_rate': 2.9790769230769232e-05, 'epoch': 1.21}


 40%|████      | 49290/121875 [15:33:56<22:08:57,  1.10s/it]

{'loss': 0.7634, 'learning_rate': 2.9778461538461538e-05, 'epoch': 1.21}


 40%|████      | 49320/121875 [15:34:29<22:11:50,  1.10s/it]

{'loss': 0.6671, 'learning_rate': 2.976615384615385e-05, 'epoch': 1.21}


 40%|████      | 49350/121875 [15:35:02<22:12:36,  1.10s/it]

{'loss': 0.6774, 'learning_rate': 2.9753846153846156e-05, 'epoch': 1.21}


 41%|████      | 49380/121875 [15:35:35<22:03:39,  1.10s/it]

{'loss': 0.7003, 'learning_rate': 2.9741538461538465e-05, 'epoch': 1.22}


 41%|████      | 49410/121875 [15:36:08<22:05:39,  1.10s/it]

{'loss': 0.762, 'learning_rate': 2.972923076923077e-05, 'epoch': 1.22}


 41%|████      | 49440/121875 [15:36:41<22:05:45,  1.10s/it]

{'loss': 0.7873, 'learning_rate': 2.971692307692308e-05, 'epoch': 1.22}


 41%|████      | 49470/121875 [15:37:14<22:04:24,  1.10s/it]

{'loss': 0.6862, 'learning_rate': 2.9704615384615386e-05, 'epoch': 1.22}


 41%|████      | 49500/121875 [15:37:47<22:09:59,  1.10s/it]

{'loss': 0.6728, 'learning_rate': 2.969230769230769e-05, 'epoch': 1.22}


 41%|████      | 49530/121875 [15:38:22<22:08:00,  1.10s/it]

{'loss': 0.7449, 'learning_rate': 2.9680000000000004e-05, 'epoch': 1.22}


 41%|████      | 49560/121875 [15:38:55<22:03:29,  1.10s/it]

{'loss': 0.6855, 'learning_rate': 2.9667692307692306e-05, 'epoch': 1.22}


 41%|████      | 49590/121875 [15:39:28<22:07:15,  1.10s/it]

{'loss': 0.7157, 'learning_rate': 2.965538461538462e-05, 'epoch': 1.22}


 41%|████      | 49620/121875 [15:40:01<21:57:51,  1.09s/it]

{'loss': 0.6608, 'learning_rate': 2.9643076923076924e-05, 'epoch': 1.22}


 41%|████      | 49650/121875 [15:40:34<22:08:38,  1.10s/it]

{'loss': 0.7096, 'learning_rate': 2.9630769230769234e-05, 'epoch': 1.22}


 41%|████      | 49680/121875 [15:41:07<21:59:47,  1.10s/it]

{'loss': 0.6707, 'learning_rate': 2.961846153846154e-05, 'epoch': 1.22}


 41%|████      | 49710/121875 [15:41:40<22:02:08,  1.10s/it]

{'loss': 0.6917, 'learning_rate': 2.9606153846153852e-05, 'epoch': 1.22}


 41%|████      | 49740/121875 [15:42:13<22:04:13,  1.10s/it]

{'loss': 0.6701, 'learning_rate': 2.9593846153846154e-05, 'epoch': 1.22}


 41%|████      | 49770/121875 [15:42:46<21:56:07,  1.10s/it]

{'loss': 0.7146, 'learning_rate': 2.958153846153846e-05, 'epoch': 1.23}


 41%|████      | 49800/121875 [15:43:19<21:59:11,  1.10s/it]

{'loss': 0.7285, 'learning_rate': 2.9569230769230772e-05, 'epoch': 1.23}


 41%|████      | 49830/121875 [15:43:52<22:10:38,  1.11s/it]

{'loss': 0.7135, 'learning_rate': 2.9556923076923078e-05, 'epoch': 1.23}


 41%|████      | 49860/121875 [15:44:25<21:54:25,  1.10s/it]

{'loss': 0.724, 'learning_rate': 2.9544615384615387e-05, 'epoch': 1.23}


 41%|████      | 49890/121875 [15:44:58<22:02:26,  1.10s/it]

{'loss': 0.6636, 'learning_rate': 2.9532307692307693e-05, 'epoch': 1.23}


 41%|████      | 49920/121875 [15:45:31<21:58:22,  1.10s/it]

{'loss': 0.7898, 'learning_rate': 2.9520000000000002e-05, 'epoch': 1.23}


 41%|████      | 49950/121875 [15:46:04<22:00:31,  1.10s/it]

{'loss': 0.7091, 'learning_rate': 2.9507692307692308e-05, 'epoch': 1.23}


 41%|████      | 49980/121875 [15:46:37<21:54:47,  1.10s/it]

{'loss': 0.7212, 'learning_rate': 2.949538461538462e-05, 'epoch': 1.23}


 41%|████      | 50010/121875 [15:47:13<22:31:57,  1.13s/it]

{'loss': 0.7157, 'learning_rate': 2.9483076923076926e-05, 'epoch': 1.23}


 41%|████      | 50040/121875 [15:47:46<21:56:41,  1.10s/it]

{'loss': 0.7302, 'learning_rate': 2.9470769230769228e-05, 'epoch': 1.23}


 41%|████      | 50070/121875 [15:48:19<21:59:16,  1.10s/it]

{'loss': 0.7615, 'learning_rate': 2.945846153846154e-05, 'epoch': 1.23}


 41%|████      | 50100/121875 [15:48:52<21:54:52,  1.10s/it]

{'loss': 0.7528, 'learning_rate': 2.9446153846153846e-05, 'epoch': 1.23}


 41%|████      | 50130/121875 [15:49:25<21:55:03,  1.10s/it]

{'loss': 0.7612, 'learning_rate': 2.9433846153846156e-05, 'epoch': 1.23}


 41%|████      | 50160/121875 [15:49:58<21:59:09,  1.10s/it]

{'loss': 0.7613, 'learning_rate': 2.942153846153846e-05, 'epoch': 1.23}


 41%|████      | 50190/121875 [15:50:31<21:55:26,  1.10s/it]

{'loss': 0.6926, 'learning_rate': 2.9409230769230774e-05, 'epoch': 1.24}


 41%|████      | 50220/121875 [15:51:04<21:46:43,  1.09s/it]

{'loss': 0.6673, 'learning_rate': 2.9396923076923076e-05, 'epoch': 1.24}


 41%|████      | 50250/121875 [15:51:37<21:52:41,  1.10s/it]

{'loss': 0.7341, 'learning_rate': 2.938461538461539e-05, 'epoch': 1.24}


 41%|████▏     | 50280/121875 [15:52:10<21:50:08,  1.10s/it]

{'loss': 0.6745, 'learning_rate': 2.9372307692307694e-05, 'epoch': 1.24}


 41%|████▏     | 50310/121875 [15:52:43<21:56:38,  1.10s/it]

{'loss': 0.6858, 'learning_rate': 2.9360000000000003e-05, 'epoch': 1.24}


 41%|████▏     | 50340/121875 [15:53:16<21:53:25,  1.10s/it]

{'loss': 0.7617, 'learning_rate': 2.934769230769231e-05, 'epoch': 1.24}


 41%|████▏     | 50370/121875 [15:53:49<21:47:17,  1.10s/it]

{'loss': 0.7662, 'learning_rate': 2.9335384615384615e-05, 'epoch': 1.24}


 41%|████▏     | 50400/121875 [15:54:22<21:52:18,  1.10s/it]

{'loss': 0.7867, 'learning_rate': 2.9323076923076924e-05, 'epoch': 1.24}


 41%|████▏     | 50430/121875 [15:54:55<21:52:18,  1.10s/it]

{'loss': 0.7255, 'learning_rate': 2.931076923076923e-05, 'epoch': 1.24}


 41%|████▏     | 50460/121875 [15:55:28<21:44:51,  1.10s/it]

{'loss': 0.7024, 'learning_rate': 2.9298461538461542e-05, 'epoch': 1.24}


 41%|████▏     | 50490/121875 [15:56:01<21:40:49,  1.09s/it]

{'loss': 0.6588, 'learning_rate': 2.9286153846153848e-05, 'epoch': 1.24}


 41%|████▏     | 50520/121875 [15:56:36<21:43:56,  1.10s/it]

{'loss': 0.7119, 'learning_rate': 2.9273846153846157e-05, 'epoch': 1.24}


 41%|████▏     | 50550/121875 [15:57:09<21:50:47,  1.10s/it]

{'loss': 0.8129, 'learning_rate': 2.9261538461538463e-05, 'epoch': 1.24}


 42%|████▏     | 50580/121875 [15:57:42<21:51:47,  1.10s/it]

{'loss': 0.6779, 'learning_rate': 2.9249230769230772e-05, 'epoch': 1.25}


 42%|████▏     | 50610/121875 [15:58:15<21:42:51,  1.10s/it]

{'loss': 0.7022, 'learning_rate': 2.9236923076923078e-05, 'epoch': 1.25}


 42%|████▏     | 50640/121875 [15:58:48<21:42:00,  1.10s/it]

{'loss': 0.7545, 'learning_rate': 2.9224615384615383e-05, 'epoch': 1.25}


 42%|████▏     | 50670/121875 [15:59:21<21:41:14,  1.10s/it]

{'loss': 0.701, 'learning_rate': 2.9212307692307696e-05, 'epoch': 1.25}


 42%|████▏     | 50700/121875 [15:59:54<21:41:54,  1.10s/it]

{'loss': 0.7571, 'learning_rate': 2.9199999999999998e-05, 'epoch': 1.25}


 42%|████▏     | 50730/121875 [16:00:27<21:41:00,  1.10s/it]

{'loss': 0.7553, 'learning_rate': 2.918769230769231e-05, 'epoch': 1.25}


 42%|████▏     | 50760/121875 [16:01:00<21:37:51,  1.10s/it]

{'loss': 0.7236, 'learning_rate': 2.9175384615384616e-05, 'epoch': 1.25}


 42%|████▏     | 50790/121875 [16:01:33<21:38:10,  1.10s/it]

{'loss': 0.713, 'learning_rate': 2.9163076923076926e-05, 'epoch': 1.25}


 42%|████▏     | 50820/121875 [16:02:06<21:41:53,  1.10s/it]

{'loss': 0.6651, 'learning_rate': 2.915076923076923e-05, 'epoch': 1.25}


 42%|████▏     | 50850/121875 [16:02:39<21:37:36,  1.10s/it]

{'loss': 0.7069, 'learning_rate': 2.9138461538461544e-05, 'epoch': 1.25}


 42%|████▏     | 50880/121875 [16:03:12<21:45:36,  1.10s/it]

{'loss': 0.6469, 'learning_rate': 2.9126153846153846e-05, 'epoch': 1.25}


 42%|████▏     | 50910/121875 [16:03:45<21:35:49,  1.10s/it]

{'loss': 0.701, 'learning_rate': 2.9113846153846152e-05, 'epoch': 1.25}


 42%|████▏     | 50940/121875 [16:04:18<21:33:28,  1.09s/it]

{'loss': 0.7637, 'learning_rate': 2.9101538461538464e-05, 'epoch': 1.25}


 42%|████▏     | 50970/121875 [16:04:51<21:37:42,  1.10s/it]

{'loss': 0.6825, 'learning_rate': 2.908923076923077e-05, 'epoch': 1.25}


 42%|████▏     | 51000/121875 [16:05:24<21:36:58,  1.10s/it]

{'loss': 0.7748, 'learning_rate': 2.907692307692308e-05, 'epoch': 1.26}


 42%|████▏     | 51030/121875 [16:05:59<21:40:42,  1.10s/it]

{'loss': 0.6994, 'learning_rate': 2.9064615384615385e-05, 'epoch': 1.26}


 42%|████▏     | 51060/121875 [16:06:32<21:31:20,  1.09s/it]

{'loss': 0.7082, 'learning_rate': 2.9052307692307694e-05, 'epoch': 1.26}


 42%|████▏     | 51090/121875 [16:07:05<21:35:53,  1.10s/it]

{'loss': 0.717, 'learning_rate': 2.904e-05, 'epoch': 1.26}


 42%|████▏     | 51120/121875 [16:07:38<21:38:50,  1.10s/it]

{'loss': 0.6908, 'learning_rate': 2.9027692307692312e-05, 'epoch': 1.26}


 42%|████▏     | 51150/121875 [16:08:11<21:32:40,  1.10s/it]

{'loss': 0.7663, 'learning_rate': 2.9015384615384618e-05, 'epoch': 1.26}


 42%|████▏     | 51180/121875 [16:08:44<21:32:16,  1.10s/it]

{'loss': 0.7665, 'learning_rate': 2.900307692307692e-05, 'epoch': 1.26}


 42%|████▏     | 51210/121875 [16:09:17<21:35:34,  1.10s/it]

{'loss': 0.6624, 'learning_rate': 2.8990769230769233e-05, 'epoch': 1.26}


 42%|████▏     | 51240/121875 [16:09:50<21:35:31,  1.10s/it]

{'loss': 0.6844, 'learning_rate': 2.897846153846154e-05, 'epoch': 1.26}


 42%|████▏     | 51270/121875 [16:10:23<21:21:39,  1.09s/it]

{'loss': 0.698, 'learning_rate': 2.8966153846153848e-05, 'epoch': 1.26}


 42%|████▏     | 51300/121875 [16:10:56<21:32:40,  1.10s/it]

{'loss': 0.746, 'learning_rate': 2.8953846153846153e-05, 'epoch': 1.26}


 42%|████▏     | 51330/121875 [16:11:28<21:21:26,  1.09s/it]

{'loss': 0.6816, 'learning_rate': 2.8941538461538466e-05, 'epoch': 1.26}


 42%|████▏     | 51360/121875 [16:12:01<21:32:27,  1.10s/it]

{'loss': 0.691, 'learning_rate': 2.8929230769230768e-05, 'epoch': 1.26}


 42%|████▏     | 51390/121875 [16:12:34<21:30:34,  1.10s/it]

{'loss': 0.6859, 'learning_rate': 2.891692307692308e-05, 'epoch': 1.26}


 42%|████▏     | 51420/121875 [16:13:07<21:31:22,  1.10s/it]

{'loss': 0.7279, 'learning_rate': 2.8904615384615386e-05, 'epoch': 1.27}


 42%|████▏     | 51450/121875 [16:13:40<21:32:14,  1.10s/it]

{'loss': 0.6757, 'learning_rate': 2.8892307692307696e-05, 'epoch': 1.27}


 42%|████▏     | 51480/121875 [16:14:13<21:29:29,  1.10s/it]

{'loss': 0.684, 'learning_rate': 2.888e-05, 'epoch': 1.27}


 42%|████▏     | 51510/121875 [16:14:48<21:58:23,  1.12s/it]

{'loss': 0.7108, 'learning_rate': 2.8867692307692307e-05, 'epoch': 1.27}


 42%|████▏     | 51540/121875 [16:15:21<21:29:13,  1.10s/it]

{'loss': 0.7629, 'learning_rate': 2.8855384615384616e-05, 'epoch': 1.27}


 42%|████▏     | 51570/121875 [16:15:54<21:26:47,  1.10s/it]

{'loss': 0.7409, 'learning_rate': 2.8843076923076922e-05, 'epoch': 1.27}


 42%|████▏     | 51600/121875 [16:16:27<21:23:35,  1.10s/it]

{'loss': 0.7681, 'learning_rate': 2.8830769230769234e-05, 'epoch': 1.27}


 42%|████▏     | 51630/121875 [16:17:00<21:33:36,  1.10s/it]

{'loss': 0.7577, 'learning_rate': 2.881846153846154e-05, 'epoch': 1.27}


 42%|████▏     | 51660/121875 [16:17:33<21:28:52,  1.10s/it]

{'loss': 0.7629, 'learning_rate': 2.880615384615385e-05, 'epoch': 1.27}


 42%|████▏     | 51690/121875 [16:18:06<21:24:47,  1.10s/it]

{'loss': 0.776, 'learning_rate': 2.8793846153846155e-05, 'epoch': 1.27}


 42%|████▏     | 51720/121875 [16:18:39<21:20:56,  1.10s/it]

{'loss': 0.7007, 'learning_rate': 2.8781538461538464e-05, 'epoch': 1.27}


 42%|████▏     | 51750/121875 [16:19:12<21:24:36,  1.10s/it]

{'loss': 0.6739, 'learning_rate': 2.876923076923077e-05, 'epoch': 1.27}


 42%|████▏     | 51780/121875 [16:19:45<21:28:09,  1.10s/it]

{'loss': 0.6849, 'learning_rate': 2.8756923076923076e-05, 'epoch': 1.27}


 43%|████▎     | 51810/121875 [16:20:18<21:21:27,  1.10s/it]

{'loss': 0.7487, 'learning_rate': 2.8744615384615388e-05, 'epoch': 1.28}


 43%|████▎     | 51840/121875 [16:20:51<21:23:17,  1.10s/it]

{'loss': 0.7159, 'learning_rate': 2.873230769230769e-05, 'epoch': 1.28}


 43%|████▎     | 51870/121875 [16:21:24<21:24:53,  1.10s/it]

{'loss': 0.7604, 'learning_rate': 2.8720000000000003e-05, 'epoch': 1.28}


 43%|████▎     | 51900/121875 [16:21:57<21:25:12,  1.10s/it]

{'loss': 0.7459, 'learning_rate': 2.870769230769231e-05, 'epoch': 1.28}


 43%|████▎     | 51930/121875 [16:22:30<21:24:54,  1.10s/it]

{'loss': 0.7051, 'learning_rate': 2.8695384615384618e-05, 'epoch': 1.28}


 43%|████▎     | 51960/121875 [16:23:03<21:21:09,  1.10s/it]

{'loss': 0.6959, 'learning_rate': 2.8683076923076923e-05, 'epoch': 1.28}


 43%|████▎     | 51990/121875 [16:23:36<21:19:13,  1.10s/it]

{'loss': 0.7864, 'learning_rate': 2.8670769230769236e-05, 'epoch': 1.28}


 43%|████▎     | 52020/121875 [16:24:11<21:21:09,  1.10s/it]

{'loss': 0.7135, 'learning_rate': 2.8658461538461538e-05, 'epoch': 1.28}


 43%|████▎     | 52050/121875 [16:24:44<21:19:38,  1.10s/it]

{'loss': 0.652, 'learning_rate': 2.8646153846153844e-05, 'epoch': 1.28}


 43%|████▎     | 52080/121875 [16:25:17<21:14:03,  1.10s/it]

{'loss': 0.626, 'learning_rate': 2.8633846153846156e-05, 'epoch': 1.28}


 43%|████▎     | 52110/121875 [16:25:50<21:12:30,  1.09s/it]

{'loss': 0.7613, 'learning_rate': 2.8621538461538462e-05, 'epoch': 1.28}


 43%|████▎     | 52140/121875 [16:26:23<21:15:30,  1.10s/it]

{'loss': 0.6185, 'learning_rate': 2.860923076923077e-05, 'epoch': 1.28}


 43%|████▎     | 52170/121875 [16:26:56<21:23:41,  1.10s/it]

{'loss': 0.6994, 'learning_rate': 2.8596923076923077e-05, 'epoch': 1.28}


 43%|████▎     | 52200/121875 [16:27:29<21:10:46,  1.09s/it]

{'loss': 0.6988, 'learning_rate': 2.8584615384615386e-05, 'epoch': 1.28}


 43%|████▎     | 52230/121875 [16:28:02<21:13:21,  1.10s/it]

{'loss': 0.6593, 'learning_rate': 2.8572307692307692e-05, 'epoch': 1.29}


 43%|████▎     | 52260/121875 [16:28:35<21:10:49,  1.10s/it]

{'loss': 0.6943, 'learning_rate': 2.8560000000000004e-05, 'epoch': 1.29}


 43%|████▎     | 52290/121875 [16:29:08<21:21:25,  1.10s/it]

{'loss': 0.7164, 'learning_rate': 2.854769230769231e-05, 'epoch': 1.29}


 43%|████▎     | 52320/121875 [16:29:41<21:23:16,  1.11s/it]

{'loss': 0.7706, 'learning_rate': 2.853538461538462e-05, 'epoch': 1.29}


 43%|████▎     | 52350/121875 [16:30:14<21:09:35,  1.10s/it]

{'loss': 0.7808, 'learning_rate': 2.8523076923076925e-05, 'epoch': 1.29}


 43%|████▎     | 52380/121875 [16:30:47<21:16:10,  1.10s/it]

{'loss': 0.6959, 'learning_rate': 2.851076923076923e-05, 'epoch': 1.29}


 43%|████▎     | 52410/121875 [16:31:20<21:15:50,  1.10s/it]

{'loss': 0.7473, 'learning_rate': 2.849846153846154e-05, 'epoch': 1.29}


 43%|████▎     | 52440/121875 [16:31:53<21:08:19,  1.10s/it]

{'loss': 0.7557, 'learning_rate': 2.8486153846153846e-05, 'epoch': 1.29}


 43%|████▎     | 52470/121875 [16:32:26<21:11:59,  1.10s/it]

{'loss': 0.6629, 'learning_rate': 2.8473846153846158e-05, 'epoch': 1.29}


 43%|████▎     | 52500/121875 [16:32:59<21:19:14,  1.11s/it]

{'loss': 0.7297, 'learning_rate': 2.846153846153846e-05, 'epoch': 1.29}


 43%|████▎     | 52530/121875 [16:33:34<21:02:30,  1.09s/it]

{'loss': 0.7375, 'learning_rate': 2.8449230769230773e-05, 'epoch': 1.29}


 43%|████▎     | 52560/121875 [16:34:07<21:05:49,  1.10s/it]

{'loss': 0.7137, 'learning_rate': 2.843692307692308e-05, 'epoch': 1.29}


 43%|████▎     | 52590/121875 [16:34:40<21:07:19,  1.10s/it]

{'loss': 0.6601, 'learning_rate': 2.8424615384615388e-05, 'epoch': 1.29}


 43%|████▎     | 52620/121875 [16:35:13<21:11:38,  1.10s/it]

{'loss': 0.7005, 'learning_rate': 2.8412307692307693e-05, 'epoch': 1.3}


 43%|████▎     | 52650/121875 [16:35:46<21:03:27,  1.10s/it]

{'loss': 0.6788, 'learning_rate': 2.84e-05, 'epoch': 1.3}


 43%|████▎     | 52680/121875 [16:36:19<21:13:41,  1.10s/it]

{'loss': 0.6955, 'learning_rate': 2.8387692307692308e-05, 'epoch': 1.3}


 43%|████▎     | 52710/121875 [16:36:51<21:07:23,  1.10s/it]

{'loss': 0.7385, 'learning_rate': 2.8375384615384614e-05, 'epoch': 1.3}


 43%|████▎     | 52740/121875 [16:37:24<21:05:51,  1.10s/it]

{'loss': 0.7812, 'learning_rate': 2.8363076923076926e-05, 'epoch': 1.3}


 43%|████▎     | 52770/121875 [16:37:57<21:07:44,  1.10s/it]

{'loss': 0.7319, 'learning_rate': 2.8350769230769232e-05, 'epoch': 1.3}


 43%|████▎     | 52800/121875 [16:38:30<21:07:21,  1.10s/it]

{'loss': 0.7003, 'learning_rate': 2.833846153846154e-05, 'epoch': 1.3}


 43%|████▎     | 52830/121875 [16:39:03<21:10:05,  1.10s/it]

{'loss': 0.7288, 'learning_rate': 2.8326153846153847e-05, 'epoch': 1.3}


 43%|████▎     | 52860/121875 [16:39:36<21:00:15,  1.10s/it]

{'loss': 0.6523, 'learning_rate': 2.8313846153846156e-05, 'epoch': 1.3}


 43%|████▎     | 52890/121875 [16:40:09<21:03:31,  1.10s/it]

{'loss': 0.6903, 'learning_rate': 2.8301538461538462e-05, 'epoch': 1.3}


 43%|████▎     | 52920/121875 [16:40:42<21:06:58,  1.10s/it]

{'loss': 0.7228, 'learning_rate': 2.8289230769230768e-05, 'epoch': 1.3}


 43%|████▎     | 52950/121875 [16:41:15<20:58:59,  1.10s/it]

{'loss': 0.7381, 'learning_rate': 2.827692307692308e-05, 'epoch': 1.3}


 43%|████▎     | 52980/121875 [16:41:48<20:57:09,  1.09s/it]

{'loss': 0.689, 'learning_rate': 2.8264615384615382e-05, 'epoch': 1.3}


 43%|████▎     | 53010/121875 [16:42:23<21:31:59,  1.13s/it]

{'loss': 0.6982, 'learning_rate': 2.8252307692307695e-05, 'epoch': 1.3}


 44%|████▎     | 53040/121875 [16:42:56<21:00:47,  1.10s/it]

{'loss': 0.7949, 'learning_rate': 2.824e-05, 'epoch': 1.31}


 44%|████▎     | 53070/121875 [16:43:29<20:56:45,  1.10s/it]

{'loss': 0.6682, 'learning_rate': 2.822769230769231e-05, 'epoch': 1.31}


 44%|████▎     | 53100/121875 [16:44:02<20:53:07,  1.09s/it]

{'loss': 0.7382, 'learning_rate': 2.8215384615384616e-05, 'epoch': 1.31}


 44%|████▎     | 53130/121875 [16:44:35<20:59:08,  1.10s/it]

{'loss': 0.7102, 'learning_rate': 2.8203076923076928e-05, 'epoch': 1.31}


 44%|████▎     | 53160/121875 [16:45:08<21:00:58,  1.10s/it]

{'loss': 0.7086, 'learning_rate': 2.819076923076923e-05, 'epoch': 1.31}


 44%|████▎     | 53190/121875 [16:45:41<20:51:46,  1.09s/it]

{'loss': 0.7054, 'learning_rate': 2.8178461538461543e-05, 'epoch': 1.31}


 44%|████▎     | 53220/121875 [16:46:14<21:03:02,  1.10s/it]

{'loss': 0.6638, 'learning_rate': 2.816615384615385e-05, 'epoch': 1.31}


 44%|████▎     | 53250/121875 [16:46:47<20:52:49,  1.10s/it]

{'loss': 0.7261, 'learning_rate': 2.8153846153846154e-05, 'epoch': 1.31}


 44%|████▎     | 53280/121875 [16:47:20<20:57:19,  1.10s/it]

{'loss': 0.676, 'learning_rate': 2.8141538461538463e-05, 'epoch': 1.31}


 44%|████▎     | 53310/121875 [16:47:53<20:53:23,  1.10s/it]

{'loss': 0.6958, 'learning_rate': 2.812923076923077e-05, 'epoch': 1.31}


 44%|████▍     | 53340/121875 [16:48:26<20:58:20,  1.10s/it]

{'loss': 0.6837, 'learning_rate': 2.8116923076923078e-05, 'epoch': 1.31}


 44%|████▍     | 53370/121875 [16:48:59<20:51:36,  1.10s/it]

{'loss': 0.7499, 'learning_rate': 2.8104615384615384e-05, 'epoch': 1.31}


 44%|████▍     | 53400/121875 [16:49:32<20:53:36,  1.10s/it]

{'loss': 0.6744, 'learning_rate': 2.8092307692307696e-05, 'epoch': 1.31}


 44%|████▍     | 53430/121875 [16:50:05<20:51:18,  1.10s/it]

{'loss': 0.7145, 'learning_rate': 2.8080000000000002e-05, 'epoch': 1.32}


 44%|████▍     | 53460/121875 [16:50:38<20:49:38,  1.10s/it]

{'loss': 0.7328, 'learning_rate': 2.806769230769231e-05, 'epoch': 1.32}


 44%|████▍     | 53490/121875 [16:51:11<20:54:18,  1.10s/it]

{'loss': 0.7384, 'learning_rate': 2.8055384615384617e-05, 'epoch': 1.32}


 44%|████▍     | 53520/121875 [16:51:46<20:43:01,  1.09s/it]

{'loss': 0.7452, 'learning_rate': 2.8043076923076923e-05, 'epoch': 1.32}


 44%|████▍     | 53550/121875 [16:52:19<20:49:57,  1.10s/it]

{'loss': 0.7173, 'learning_rate': 2.8030769230769232e-05, 'epoch': 1.32}


 44%|████▍     | 53580/121875 [16:52:52<20:49:42,  1.10s/it]

{'loss': 0.7515, 'learning_rate': 2.8018461538461538e-05, 'epoch': 1.32}


 44%|████▍     | 53610/121875 [16:53:25<20:47:05,  1.10s/it]

{'loss': 0.7146, 'learning_rate': 2.800615384615385e-05, 'epoch': 1.32}


 44%|████▍     | 53640/121875 [16:53:58<20:52:16,  1.10s/it]

{'loss': 0.7157, 'learning_rate': 2.7993846153846152e-05, 'epoch': 1.32}


 44%|████▍     | 53670/121875 [16:54:31<20:53:47,  1.10s/it]

{'loss': 0.6655, 'learning_rate': 2.7981538461538465e-05, 'epoch': 1.32}


 44%|████▍     | 53700/121875 [16:55:04<20:45:43,  1.10s/it]

{'loss': 0.6665, 'learning_rate': 2.796923076923077e-05, 'epoch': 1.32}


 44%|████▍     | 53730/121875 [16:55:37<20:50:49,  1.10s/it]

{'loss': 0.7444, 'learning_rate': 2.795692307692308e-05, 'epoch': 1.32}


 44%|████▍     | 53760/121875 [16:56:10<20:47:06,  1.10s/it]

{'loss': 0.7185, 'learning_rate': 2.7944615384615386e-05, 'epoch': 1.32}


 44%|████▍     | 53790/121875 [16:56:43<20:56:17,  1.11s/it]

{'loss': 0.6691, 'learning_rate': 2.793230769230769e-05, 'epoch': 1.32}


 44%|████▍     | 53820/121875 [16:57:16<20:46:19,  1.10s/it]

{'loss': 0.6973, 'learning_rate': 2.792e-05, 'epoch': 1.32}


 44%|████▍     | 53850/121875 [16:57:49<20:49:43,  1.10s/it]

{'loss': 0.6907, 'learning_rate': 2.7907692307692306e-05, 'epoch': 1.33}


 44%|████▍     | 53880/121875 [16:58:22<20:46:50,  1.10s/it]

{'loss': 0.6923, 'learning_rate': 2.789538461538462e-05, 'epoch': 1.33}


 44%|████▍     | 53910/121875 [16:58:55<20:43:14,  1.10s/it]

{'loss': 0.704, 'learning_rate': 2.7883076923076924e-05, 'epoch': 1.33}


 44%|████▍     | 53940/121875 [16:59:28<20:41:33,  1.10s/it]

{'loss': 0.7208, 'learning_rate': 2.7870769230769233e-05, 'epoch': 1.33}


 44%|████▍     | 53970/121875 [17:00:00<20:45:31,  1.10s/it]

{'loss': 0.6938, 'learning_rate': 2.785846153846154e-05, 'epoch': 1.33}


 44%|████▍     | 54000/121875 [17:00:33<20:34:39,  1.09s/it]

{'loss': 0.7338, 'learning_rate': 2.7846153846153848e-05, 'epoch': 1.33}


 44%|████▍     | 54030/121875 [17:01:09<20:43:06,  1.10s/it]

{'loss': 0.7923, 'learning_rate': 2.7833846153846154e-05, 'epoch': 1.33}


 44%|████▍     | 54060/121875 [17:01:42<20:40:52,  1.10s/it]

{'loss': 0.6772, 'learning_rate': 2.7821538461538466e-05, 'epoch': 1.33}


 44%|████▍     | 54090/121875 [17:02:15<20:40:14,  1.10s/it]

{'loss': 0.6811, 'learning_rate': 2.7809230769230772e-05, 'epoch': 1.33}


 44%|████▍     | 54120/121875 [17:02:48<20:42:35,  1.10s/it]

{'loss': 0.6965, 'learning_rate': 2.7796923076923075e-05, 'epoch': 1.33}


 44%|████▍     | 54150/121875 [17:03:21<20:39:32,  1.10s/it]

{'loss': 0.6548, 'learning_rate': 2.7784615384615387e-05, 'epoch': 1.33}


 44%|████▍     | 54180/121875 [17:03:54<20:34:31,  1.09s/it]

{'loss': 0.7302, 'learning_rate': 2.7772307692307693e-05, 'epoch': 1.33}


 44%|████▍     | 54210/121875 [17:04:26<20:29:39,  1.09s/it]

{'loss': 0.7323, 'learning_rate': 2.7760000000000002e-05, 'epoch': 1.33}


 45%|████▍     | 54240/121875 [17:04:59<20:35:01,  1.10s/it]

{'loss': 0.673, 'learning_rate': 2.7747692307692308e-05, 'epoch': 1.34}


 45%|████▍     | 54270/121875 [17:05:32<20:39:30,  1.10s/it]

{'loss': 0.7887, 'learning_rate': 2.773538461538462e-05, 'epoch': 1.34}


 45%|████▍     | 54300/121875 [17:06:06<20:38:01,  1.10s/it]

{'loss': 0.6644, 'learning_rate': 2.7723076923076922e-05, 'epoch': 1.34}


 45%|████▍     | 54330/121875 [17:06:39<20:33:00,  1.10s/it]

{'loss': 0.8021, 'learning_rate': 2.7710769230769235e-05, 'epoch': 1.34}


 45%|████▍     | 54360/121875 [17:07:11<20:31:44,  1.09s/it]

{'loss': 0.7274, 'learning_rate': 2.769846153846154e-05, 'epoch': 1.34}


 45%|████▍     | 54390/121875 [17:07:44<20:30:31,  1.09s/it]

{'loss': 0.6693, 'learning_rate': 2.7686153846153846e-05, 'epoch': 1.34}


 45%|████▍     | 54420/121875 [17:08:17<20:36:41,  1.10s/it]

{'loss': 0.7278, 'learning_rate': 2.7673846153846156e-05, 'epoch': 1.34}


 45%|████▍     | 54450/121875 [17:08:50<20:40:50,  1.10s/it]

{'loss': 0.6968, 'learning_rate': 2.766153846153846e-05, 'epoch': 1.34}


 45%|████▍     | 54480/121875 [17:09:23<20:38:57,  1.10s/it]

{'loss': 0.6959, 'learning_rate': 2.764923076923077e-05, 'epoch': 1.34}


 45%|████▍     | 54510/121875 [17:09:59<20:56:19,  1.12s/it]

{'loss': 0.7433, 'learning_rate': 2.7636923076923076e-05, 'epoch': 1.34}


 45%|████▍     | 54540/121875 [17:10:32<20:40:01,  1.10s/it]

{'loss': 0.6494, 'learning_rate': 2.762461538461539e-05, 'epoch': 1.34}


 45%|████▍     | 54570/121875 [17:11:04<20:35:26,  1.10s/it]

{'loss': 0.6837, 'learning_rate': 2.7612307692307694e-05, 'epoch': 1.34}


 45%|████▍     | 54600/121875 [17:11:37<20:32:42,  1.10s/it]

{'loss': 0.7409, 'learning_rate': 2.7600000000000003e-05, 'epoch': 1.34}


 45%|████▍     | 54630/121875 [17:12:10<20:34:58,  1.10s/it]

{'loss': 0.7655, 'learning_rate': 2.758769230769231e-05, 'epoch': 1.34}


 45%|████▍     | 54660/121875 [17:12:43<20:32:04,  1.10s/it]

{'loss': 0.7807, 'learning_rate': 2.7575384615384615e-05, 'epoch': 1.35}


 45%|████▍     | 54690/121875 [17:13:16<20:25:27,  1.09s/it]

{'loss': 0.7377, 'learning_rate': 2.7563076923076924e-05, 'epoch': 1.35}


 45%|████▍     | 54720/121875 [17:13:49<20:28:24,  1.10s/it]

{'loss': 0.727, 'learning_rate': 2.755076923076923e-05, 'epoch': 1.35}


 45%|████▍     | 54750/121875 [17:14:22<20:37:54,  1.11s/it]

{'loss': 0.6869, 'learning_rate': 2.7538461538461542e-05, 'epoch': 1.35}


 45%|████▍     | 54780/121875 [17:14:55<20:32:28,  1.10s/it]

{'loss': 0.7007, 'learning_rate': 2.7526153846153845e-05, 'epoch': 1.35}


 45%|████▍     | 54810/121875 [17:15:28<20:23:14,  1.09s/it]

{'loss': 0.7805, 'learning_rate': 2.7513846153846157e-05, 'epoch': 1.35}


 45%|████▍     | 54840/121875 [17:16:01<20:31:06,  1.10s/it]

{'loss': 0.6872, 'learning_rate': 2.7501538461538463e-05, 'epoch': 1.35}


 45%|████▌     | 54870/121875 [17:16:34<20:26:57,  1.10s/it]

{'loss': 0.7185, 'learning_rate': 2.7489230769230772e-05, 'epoch': 1.35}


 45%|████▌     | 54900/121875 [17:17:07<20:28:32,  1.10s/it]

{'loss': 0.6562, 'learning_rate': 2.7476923076923078e-05, 'epoch': 1.35}


 45%|████▌     | 54930/121875 [17:17:40<20:24:13,  1.10s/it]

{'loss': 0.7102, 'learning_rate': 2.746461538461539e-05, 'epoch': 1.35}


 45%|████▌     | 54960/121875 [17:18:13<20:26:09,  1.10s/it]

{'loss': 0.7261, 'learning_rate': 2.7452307692307692e-05, 'epoch': 1.35}


 45%|████▌     | 54990/121875 [17:18:46<20:28:42,  1.10s/it]

{'loss': 0.7588, 'learning_rate': 2.7439999999999998e-05, 'epoch': 1.35}


 45%|████▌     | 55020/121875 [17:19:22<20:21:41,  1.10s/it]

{'loss': 0.7382, 'learning_rate': 2.742769230769231e-05, 'epoch': 1.35}


 45%|████▌     | 55050/121875 [17:19:55<20:27:06,  1.10s/it]

{'loss': 0.7478, 'learning_rate': 2.7415384615384616e-05, 'epoch': 1.36}


 45%|████▌     | 55080/121875 [17:20:27<20:20:45,  1.10s/it]

{'loss': 0.7152, 'learning_rate': 2.7403076923076926e-05, 'epoch': 1.36}


 45%|████▌     | 55110/121875 [17:21:00<20:28:21,  1.10s/it]

{'loss': 0.721, 'learning_rate': 2.739076923076923e-05, 'epoch': 1.36}


 45%|████▌     | 55140/121875 [17:21:34<20:29:02,  1.10s/it]

{'loss': 0.7287, 'learning_rate': 2.737846153846154e-05, 'epoch': 1.36}


 45%|████▌     | 55170/121875 [17:22:07<20:22:11,  1.10s/it]

{'loss': 0.6838, 'learning_rate': 2.7366153846153846e-05, 'epoch': 1.36}


 45%|████▌     | 55200/121875 [17:22:40<20:15:05,  1.09s/it]

{'loss': 0.687, 'learning_rate': 2.735384615384616e-05, 'epoch': 1.36}


 45%|████▌     | 55230/121875 [17:23:13<20:21:43,  1.10s/it]

{'loss': 0.7725, 'learning_rate': 2.7341538461538464e-05, 'epoch': 1.36}


 45%|████▌     | 55260/121875 [17:23:46<20:15:08,  1.09s/it]

{'loss': 0.7029, 'learning_rate': 2.7329230769230767e-05, 'epoch': 1.36}


 45%|████▌     | 55290/121875 [17:24:19<20:22:08,  1.10s/it]

{'loss': 0.7313, 'learning_rate': 2.731692307692308e-05, 'epoch': 1.36}


 45%|████▌     | 55320/121875 [17:24:52<20:25:24,  1.10s/it]

{'loss': 0.7327, 'learning_rate': 2.7304615384615385e-05, 'epoch': 1.36}


 45%|████▌     | 55350/121875 [17:25:25<20:21:10,  1.10s/it]

{'loss': 0.7288, 'learning_rate': 2.7292307692307694e-05, 'epoch': 1.36}


 45%|████▌     | 55380/121875 [17:25:58<20:20:55,  1.10s/it]

{'loss': 0.6632, 'learning_rate': 2.728e-05, 'epoch': 1.36}


 45%|████▌     | 55410/121875 [17:26:31<20:18:25,  1.10s/it]

{'loss': 0.715, 'learning_rate': 2.7267692307692312e-05, 'epoch': 1.36}


 45%|████▌     | 55440/121875 [17:27:04<20:13:48,  1.10s/it]

{'loss': 0.6221, 'learning_rate': 2.7255384615384615e-05, 'epoch': 1.36}


 46%|████▌     | 55470/121875 [17:27:37<20:17:55,  1.10s/it]

{'loss': 0.7015, 'learning_rate': 2.7243076923076927e-05, 'epoch': 1.37}


 46%|████▌     | 55500/121875 [17:28:10<20:16:07,  1.10s/it]

{'loss': 0.6923, 'learning_rate': 2.7230769230769233e-05, 'epoch': 1.37}


 46%|████▌     | 55530/121875 [17:28:45<20:18:50,  1.10s/it]

{'loss': 0.7322, 'learning_rate': 2.721846153846154e-05, 'epoch': 1.37}


 46%|████▌     | 55560/121875 [17:29:18<20:21:39,  1.11s/it]

{'loss': 0.7183, 'learning_rate': 2.7206153846153848e-05, 'epoch': 1.37}


 46%|████▌     | 55590/121875 [17:29:51<20:13:03,  1.10s/it]

{'loss': 0.7277, 'learning_rate': 2.7193846153846153e-05, 'epoch': 1.37}


 46%|████▌     | 55620/121875 [17:30:24<20:13:00,  1.10s/it]

{'loss': 0.6614, 'learning_rate': 2.7181538461538462e-05, 'epoch': 1.37}


 46%|████▌     | 55650/121875 [17:30:57<20:13:13,  1.10s/it]

{'loss': 0.7738, 'learning_rate': 2.7169230769230768e-05, 'epoch': 1.37}


 46%|████▌     | 55680/121875 [17:31:30<20:15:39,  1.10s/it]

{'loss': 0.6878, 'learning_rate': 2.715692307692308e-05, 'epoch': 1.37}


 46%|████▌     | 55710/121875 [17:32:03<20:11:57,  1.10s/it]

{'loss': 0.734, 'learning_rate': 2.7144615384615386e-05, 'epoch': 1.37}


 46%|████▌     | 55740/121875 [17:32:36<20:12:57,  1.10s/it]

{'loss': 0.7743, 'learning_rate': 2.7132307692307696e-05, 'epoch': 1.37}


 46%|████▌     | 55770/121875 [17:33:09<20:03:19,  1.09s/it]

{'loss': 0.751, 'learning_rate': 2.712e-05, 'epoch': 1.37}


 46%|████▌     | 55800/121875 [17:33:42<20:12:44,  1.10s/it]

{'loss': 0.6944, 'learning_rate': 2.710769230769231e-05, 'epoch': 1.37}


 46%|████▌     | 55830/121875 [17:34:15<20:06:36,  1.10s/it]

{'loss': 0.7001, 'learning_rate': 2.7095384615384616e-05, 'epoch': 1.37}


 46%|████▌     | 55860/121875 [17:34:48<20:14:15,  1.10s/it]

{'loss': 0.6621, 'learning_rate': 2.7083076923076922e-05, 'epoch': 1.38}


 46%|████▌     | 55890/121875 [17:35:21<20:06:55,  1.10s/it]

{'loss': 0.6993, 'learning_rate': 2.7070769230769234e-05, 'epoch': 1.38}


 46%|████▌     | 55920/121875 [17:35:54<20:09:49,  1.10s/it]

{'loss': 0.7552, 'learning_rate': 2.7058461538461537e-05, 'epoch': 1.38}


 46%|████▌     | 55950/121875 [17:36:27<20:10:05,  1.10s/it]

{'loss': 0.718, 'learning_rate': 2.704615384615385e-05, 'epoch': 1.38}


 46%|████▌     | 55980/121875 [17:37:00<20:05:44,  1.10s/it]

{'loss': 0.7014, 'learning_rate': 2.7033846153846155e-05, 'epoch': 1.38}


 46%|████▌     | 56010/121875 [17:37:35<20:41:48,  1.13s/it]

{'loss': 0.7482, 'learning_rate': 2.7021538461538464e-05, 'epoch': 1.38}


 46%|████▌     | 56040/121875 [17:38:08<20:08:31,  1.10s/it]

{'loss': 0.6968, 'learning_rate': 2.700923076923077e-05, 'epoch': 1.38}


 46%|████▌     | 56070/121875 [17:38:41<20:03:44,  1.10s/it]

{'loss': 0.742, 'learning_rate': 2.6996923076923082e-05, 'epoch': 1.38}


 46%|████▌     | 56100/121875 [17:39:14<20:06:34,  1.10s/it]

{'loss': 0.7392, 'learning_rate': 2.6984615384615385e-05, 'epoch': 1.38}


 46%|████▌     | 56130/121875 [17:39:47<20:07:40,  1.10s/it]

{'loss': 0.7116, 'learning_rate': 2.697230769230769e-05, 'epoch': 1.38}


 46%|████▌     | 56160/121875 [17:40:20<20:08:09,  1.10s/it]

{'loss': 0.7142, 'learning_rate': 2.6960000000000003e-05, 'epoch': 1.38}


 46%|████▌     | 56190/121875 [17:40:53<20:02:50,  1.10s/it]

{'loss': 0.7011, 'learning_rate': 2.694769230769231e-05, 'epoch': 1.38}


 46%|████▌     | 56220/121875 [17:41:26<20:04:40,  1.10s/it]

{'loss': 0.7043, 'learning_rate': 2.6935384615384618e-05, 'epoch': 1.38}


 46%|████▌     | 56250/121875 [17:41:59<20:03:59,  1.10s/it]

{'loss': 0.7884, 'learning_rate': 2.6923076923076923e-05, 'epoch': 1.38}


 46%|████▌     | 56280/121875 [17:42:32<20:00:10,  1.10s/it]

{'loss': 0.6891, 'learning_rate': 2.6910769230769232e-05, 'epoch': 1.39}


 46%|████▌     | 56310/121875 [17:43:05<20:02:57,  1.10s/it]

{'loss': 0.7177, 'learning_rate': 2.6898461538461538e-05, 'epoch': 1.39}


 46%|████▌     | 56340/121875 [17:43:38<19:59:50,  1.10s/it]

{'loss': 0.7121, 'learning_rate': 2.688615384615385e-05, 'epoch': 1.39}


 46%|████▋     | 56370/121875 [17:44:11<20:03:21,  1.10s/it]

{'loss': 0.697, 'learning_rate': 2.6873846153846156e-05, 'epoch': 1.39}


 46%|████▋     | 56400/121875 [17:44:44<20:01:16,  1.10s/it]

{'loss': 0.7525, 'learning_rate': 2.686153846153846e-05, 'epoch': 1.39}


 46%|████▋     | 56430/121875 [17:45:17<20:01:17,  1.10s/it]

{'loss': 0.6817, 'learning_rate': 2.684923076923077e-05, 'epoch': 1.39}


 46%|████▋     | 56460/121875 [17:45:50<19:59:19,  1.10s/it]

{'loss': 0.656, 'learning_rate': 2.6836923076923077e-05, 'epoch': 1.39}


 46%|████▋     | 56490/121875 [17:46:23<19:59:07,  1.10s/it]

{'loss': 0.716, 'learning_rate': 2.6824615384615386e-05, 'epoch': 1.39}


 46%|████▋     | 56520/121875 [17:46:59<19:57:10,  1.10s/it]

{'loss': 0.7366, 'learning_rate': 2.6812307692307692e-05, 'epoch': 1.39}


 46%|████▋     | 56550/121875 [17:47:32<20:02:13,  1.10s/it]

{'loss': 0.7042, 'learning_rate': 2.6800000000000004e-05, 'epoch': 1.39}


 46%|████▋     | 56580/121875 [17:48:05<19:58:49,  1.10s/it]

{'loss': 0.7223, 'learning_rate': 2.6787692307692307e-05, 'epoch': 1.39}


 46%|████▋     | 56610/121875 [17:48:38<19:57:47,  1.10s/it]

{'loss': 0.6984, 'learning_rate': 2.677538461538462e-05, 'epoch': 1.39}


 46%|████▋     | 56640/121875 [17:49:11<19:49:50,  1.09s/it]

{'loss': 0.6931, 'learning_rate': 2.6763076923076925e-05, 'epoch': 1.39}


 46%|████▋     | 56670/121875 [17:49:44<19:57:27,  1.10s/it]

{'loss': 0.6654, 'learning_rate': 2.6750769230769234e-05, 'epoch': 1.39}


 47%|████▋     | 56700/121875 [17:50:17<19:53:56,  1.10s/it]

{'loss': 0.7021, 'learning_rate': 2.673846153846154e-05, 'epoch': 1.4}


 47%|████▋     | 56730/121875 [17:50:50<19:55:18,  1.10s/it]

{'loss': 0.7306, 'learning_rate': 2.6726153846153845e-05, 'epoch': 1.4}


 47%|████▋     | 56760/121875 [17:51:23<19:55:44,  1.10s/it]

{'loss': 0.6541, 'learning_rate': 2.6713846153846155e-05, 'epoch': 1.4}


 47%|████▋     | 56790/121875 [17:51:56<19:54:25,  1.10s/it]

{'loss': 0.6644, 'learning_rate': 2.670153846153846e-05, 'epoch': 1.4}


 47%|████▋     | 56820/121875 [17:52:29<19:44:49,  1.09s/it]

{'loss': 0.6927, 'learning_rate': 2.6689230769230773e-05, 'epoch': 1.4}


 47%|████▋     | 56850/121875 [17:53:02<19:48:35,  1.10s/it]

{'loss': 0.7097, 'learning_rate': 2.667692307692308e-05, 'epoch': 1.4}


 47%|████▋     | 56880/121875 [17:53:35<19:51:37,  1.10s/it]

{'loss': 0.7116, 'learning_rate': 2.6664615384615388e-05, 'epoch': 1.4}


 47%|████▋     | 56910/121875 [17:54:08<19:53:30,  1.10s/it]

{'loss': 0.728, 'learning_rate': 2.6652307692307693e-05, 'epoch': 1.4}


 47%|████▋     | 56940/121875 [17:54:41<19:55:13,  1.10s/it]

{'loss': 0.7157, 'learning_rate': 2.6640000000000002e-05, 'epoch': 1.4}


 47%|████▋     | 56970/121875 [17:55:14<19:48:30,  1.10s/it]

{'loss': 0.7047, 'learning_rate': 2.6627692307692308e-05, 'epoch': 1.4}


 47%|████▋     | 57000/121875 [17:55:47<19:54:12,  1.10s/it]

{'loss': 0.7021, 'learning_rate': 2.6615384615384614e-05, 'epoch': 1.4}


 47%|████▋     | 57030/121875 [17:56:22<19:44:44,  1.10s/it]

{'loss': 0.7195, 'learning_rate': 2.6603076923076926e-05, 'epoch': 1.4}


 47%|████▋     | 57060/121875 [17:56:55<19:50:17,  1.10s/it]

{'loss': 0.698, 'learning_rate': 2.659076923076923e-05, 'epoch': 1.4}


 47%|████▋     | 57090/121875 [17:57:28<19:46:40,  1.10s/it]

{'loss': 0.6985, 'learning_rate': 2.657846153846154e-05, 'epoch': 1.41}


 47%|████▋     | 57120/121875 [17:58:01<19:49:11,  1.10s/it]

{'loss': 0.7536, 'learning_rate': 2.6566153846153847e-05, 'epoch': 1.41}


 47%|████▋     | 57150/121875 [17:58:34<19:43:02,  1.10s/it]

{'loss': 0.7168, 'learning_rate': 2.6553846153846156e-05, 'epoch': 1.41}


 47%|████▋     | 57180/121875 [17:59:07<19:49:08,  1.10s/it]

{'loss': 0.6751, 'learning_rate': 2.6541538461538462e-05, 'epoch': 1.41}


 47%|████▋     | 57210/121875 [17:59:40<19:44:43,  1.10s/it]

{'loss': 0.6711, 'learning_rate': 2.6529230769230774e-05, 'epoch': 1.41}


 47%|████▋     | 57240/121875 [18:00:13<19:45:52,  1.10s/it]

{'loss': 0.6399, 'learning_rate': 2.6516923076923077e-05, 'epoch': 1.41}


 47%|████▋     | 57270/121875 [18:00:46<19:37:27,  1.09s/it]

{'loss': 0.7703, 'learning_rate': 2.6504615384615382e-05, 'epoch': 1.41}


 47%|████▋     | 57300/121875 [18:01:19<19:42:41,  1.10s/it]

{'loss': 0.7146, 'learning_rate': 2.6492307692307695e-05, 'epoch': 1.41}


 47%|████▋     | 57330/121875 [18:01:52<19:41:35,  1.10s/it]

{'loss': 0.6921, 'learning_rate': 2.648e-05, 'epoch': 1.41}


 47%|████▋     | 57360/121875 [18:02:25<19:46:01,  1.10s/it]

{'loss': 0.7569, 'learning_rate': 2.646769230769231e-05, 'epoch': 1.41}


 47%|████▋     | 57390/121875 [18:02:58<19:42:20,  1.10s/it]

{'loss': 0.6675, 'learning_rate': 2.6455384615384615e-05, 'epoch': 1.41}


 47%|████▋     | 57420/121875 [18:03:31<19:38:08,  1.10s/it]

{'loss': 0.7147, 'learning_rate': 2.6443076923076925e-05, 'epoch': 1.41}


 47%|████▋     | 57450/121875 [18:04:04<19:39:08,  1.10s/it]

{'loss': 0.6683, 'learning_rate': 2.643076923076923e-05, 'epoch': 1.41}


 47%|████▋     | 57480/121875 [18:04:37<19:38:53,  1.10s/it]

{'loss': 0.7174, 'learning_rate': 2.6418461538461543e-05, 'epoch': 1.41}


 47%|████▋     | 57510/121875 [18:05:12<20:09:37,  1.13s/it]

{'loss': 0.7429, 'learning_rate': 2.640615384615385e-05, 'epoch': 1.42}


 47%|████▋     | 57540/121875 [18:05:45<19:35:32,  1.10s/it]

{'loss': 0.6768, 'learning_rate': 2.6393846153846158e-05, 'epoch': 1.42}


 47%|████▋     | 57570/121875 [18:06:18<19:45:15,  1.11s/it]

{'loss': 0.6544, 'learning_rate': 2.6381538461538463e-05, 'epoch': 1.42}


 47%|████▋     | 57600/121875 [18:06:51<19:35:49,  1.10s/it]

{'loss': 0.7017, 'learning_rate': 2.636923076923077e-05, 'epoch': 1.42}


 47%|████▋     | 57630/121875 [18:07:24<19:40:34,  1.10s/it]

{'loss': 0.6747, 'learning_rate': 2.6356923076923078e-05, 'epoch': 1.42}


 47%|████▋     | 57660/121875 [18:07:57<19:32:20,  1.10s/it]

{'loss': 0.755, 'learning_rate': 2.6344615384615384e-05, 'epoch': 1.42}


 47%|████▋     | 57690/121875 [18:08:30<19:34:00,  1.10s/it]

{'loss': 0.6922, 'learning_rate': 2.6332307692307696e-05, 'epoch': 1.42}


 47%|████▋     | 57720/121875 [18:09:03<19:31:29,  1.10s/it]

{'loss': 0.7404, 'learning_rate': 2.632e-05, 'epoch': 1.42}


 47%|████▋     | 57750/121875 [18:09:36<19:36:36,  1.10s/it]

{'loss': 0.7165, 'learning_rate': 2.630769230769231e-05, 'epoch': 1.42}


 47%|████▋     | 57780/121875 [18:10:09<19:33:09,  1.10s/it]

{'loss': 0.7151, 'learning_rate': 2.6295384615384617e-05, 'epoch': 1.42}


 47%|████▋     | 57810/121875 [18:10:42<19:38:01,  1.10s/it]

{'loss': 0.7488, 'learning_rate': 2.6283076923076926e-05, 'epoch': 1.42}


 47%|████▋     | 57840/121875 [18:11:15<19:32:09,  1.10s/it]

{'loss': 0.6944, 'learning_rate': 2.6270769230769232e-05, 'epoch': 1.42}


 47%|████▋     | 57870/121875 [18:11:48<19:32:30,  1.10s/it]

{'loss': 0.6987, 'learning_rate': 2.6258461538461538e-05, 'epoch': 1.42}


 48%|████▊     | 57900/121875 [18:12:21<19:27:24,  1.09s/it]

{'loss': 0.7614, 'learning_rate': 2.6246153846153847e-05, 'epoch': 1.43}


 48%|████▊     | 57930/121875 [18:12:54<19:38:10,  1.11s/it]

{'loss': 0.6994, 'learning_rate': 2.6233846153846152e-05, 'epoch': 1.43}


 48%|████▊     | 57960/121875 [18:13:27<19:34:14,  1.10s/it]

{'loss': 0.7614, 'learning_rate': 2.6221538461538465e-05, 'epoch': 1.43}


 48%|████▊     | 57990/121875 [18:14:00<19:29:54,  1.10s/it]

{'loss': 0.6865, 'learning_rate': 2.620923076923077e-05, 'epoch': 1.43}


 48%|████▊     | 58020/121875 [18:14:35<19:31:53,  1.10s/it]

{'loss': 0.6742, 'learning_rate': 2.619692307692308e-05, 'epoch': 1.43}


 48%|████▊     | 58050/121875 [18:15:08<19:34:10,  1.10s/it]

{'loss': 0.7344, 'learning_rate': 2.6184615384615385e-05, 'epoch': 1.43}


 48%|████▊     | 58080/121875 [18:15:41<19:28:19,  1.10s/it]

{'loss': 0.712, 'learning_rate': 2.6172307692307695e-05, 'epoch': 1.43}


 48%|████▊     | 58110/121875 [18:16:14<19:25:46,  1.10s/it]

{'loss': 0.7447, 'learning_rate': 2.616e-05, 'epoch': 1.43}


 48%|████▊     | 58140/121875 [18:16:47<19:31:43,  1.10s/it]

{'loss': 0.6951, 'learning_rate': 2.6147692307692306e-05, 'epoch': 1.43}


 48%|████▊     | 58170/121875 [18:17:20<19:29:21,  1.10s/it]

{'loss': 0.7471, 'learning_rate': 2.613538461538462e-05, 'epoch': 1.43}


 48%|████▊     | 58200/121875 [18:17:53<19:28:29,  1.10s/it]

{'loss': 0.6852, 'learning_rate': 2.612307692307692e-05, 'epoch': 1.43}


 48%|████▊     | 58230/121875 [18:18:26<19:33:26,  1.11s/it]

{'loss': 0.666, 'learning_rate': 2.6110769230769233e-05, 'epoch': 1.43}


 48%|████▊     | 58260/121875 [18:18:59<19:28:10,  1.10s/it]

{'loss': 0.6672, 'learning_rate': 2.609846153846154e-05, 'epoch': 1.43}


 48%|████▊     | 58290/121875 [18:19:32<19:23:40,  1.10s/it]

{'loss': 0.6592, 'learning_rate': 2.6086153846153848e-05, 'epoch': 1.43}


 48%|████▊     | 58320/121875 [18:20:05<19:24:54,  1.10s/it]

{'loss': 0.7189, 'learning_rate': 2.6073846153846154e-05, 'epoch': 1.44}


 48%|████▊     | 58350/121875 [18:20:38<19:24:18,  1.10s/it]

{'loss': 0.7647, 'learning_rate': 2.6061538461538466e-05, 'epoch': 1.44}


 48%|████▊     | 58380/121875 [18:21:11<19:27:29,  1.10s/it]

{'loss': 0.6605, 'learning_rate': 2.604923076923077e-05, 'epoch': 1.44}


 48%|████▊     | 58410/121875 [18:21:44<19:24:01,  1.10s/it]

{'loss': 0.6593, 'learning_rate': 2.6036923076923074e-05, 'epoch': 1.44}


 48%|████▊     | 58440/121875 [18:22:17<19:26:05,  1.10s/it]

{'loss': 0.6449, 'learning_rate': 2.6024615384615387e-05, 'epoch': 1.44}


 48%|████▊     | 58470/121875 [18:22:50<19:22:10,  1.10s/it]

{'loss': 0.7802, 'learning_rate': 2.6012307692307693e-05, 'epoch': 1.44}


 48%|████▊     | 58500/121875 [18:23:23<19:23:48,  1.10s/it]

{'loss': 0.6691, 'learning_rate': 2.6000000000000002e-05, 'epoch': 1.44}


 48%|████▊     | 58530/121875 [18:23:58<19:25:07,  1.10s/it]

{'loss': 0.7911, 'learning_rate': 2.5987692307692308e-05, 'epoch': 1.44}


 48%|████▊     | 58560/121875 [18:24:31<19:20:37,  1.10s/it]

{'loss': 0.6731, 'learning_rate': 2.5975384615384617e-05, 'epoch': 1.44}


 48%|████▊     | 58590/121875 [18:25:04<19:21:17,  1.10s/it]

{'loss': 0.6747, 'learning_rate': 2.5963076923076922e-05, 'epoch': 1.44}


 48%|████▊     | 58620/121875 [18:25:37<19:22:10,  1.10s/it]

{'loss': 0.713, 'learning_rate': 2.5950769230769235e-05, 'epoch': 1.44}


 48%|████▊     | 58650/121875 [18:26:10<19:22:12,  1.10s/it]

{'loss': 0.749, 'learning_rate': 2.593846153846154e-05, 'epoch': 1.44}


 48%|████▊     | 58680/121875 [18:26:43<19:19:30,  1.10s/it]

{'loss': 0.7328, 'learning_rate': 2.592615384615385e-05, 'epoch': 1.44}


 48%|████▊     | 58710/121875 [18:27:16<19:21:50,  1.10s/it]

{'loss': 0.7458, 'learning_rate': 2.5913846153846155e-05, 'epoch': 1.45}


 48%|████▊     | 58740/121875 [18:27:49<19:20:23,  1.10s/it]

{'loss': 0.7011, 'learning_rate': 2.590153846153846e-05, 'epoch': 1.45}


 48%|████▊     | 58770/121875 [18:28:22<19:15:41,  1.10s/it]

{'loss': 0.6884, 'learning_rate': 2.588923076923077e-05, 'epoch': 1.45}


 48%|████▊     | 58800/121875 [18:28:55<19:14:15,  1.10s/it]

{'loss': 0.6801, 'learning_rate': 2.5876923076923076e-05, 'epoch': 1.45}


 48%|████▊     | 58830/121875 [18:29:28<19:19:59,  1.10s/it]

{'loss': 0.7174, 'learning_rate': 2.586461538461539e-05, 'epoch': 1.45}


 48%|████▊     | 58860/121875 [18:30:01<19:13:28,  1.10s/it]

{'loss': 0.6581, 'learning_rate': 2.585230769230769e-05, 'epoch': 1.45}


 48%|████▊     | 58890/121875 [18:30:34<19:14:38,  1.10s/it]

{'loss': 0.6931, 'learning_rate': 2.5840000000000003e-05, 'epoch': 1.45}


 48%|████▊     | 58920/121875 [18:31:07<19:10:09,  1.10s/it]

{'loss': 0.7569, 'learning_rate': 2.582769230769231e-05, 'epoch': 1.45}


 48%|████▊     | 58950/121875 [18:31:40<19:09:32,  1.10s/it]

{'loss': 0.6605, 'learning_rate': 2.5815384615384618e-05, 'epoch': 1.45}


 48%|████▊     | 58980/121875 [18:32:13<19:11:22,  1.10s/it]

{'loss': 0.7281, 'learning_rate': 2.5803076923076924e-05, 'epoch': 1.45}


 48%|████▊     | 59010/121875 [18:32:49<19:43:41,  1.13s/it]

{'loss': 0.7167, 'learning_rate': 2.579076923076923e-05, 'epoch': 1.45}


 48%|████▊     | 59040/121875 [18:33:22<19:14:39,  1.10s/it]

{'loss': 0.7083, 'learning_rate': 2.577846153846154e-05, 'epoch': 1.45}


 48%|████▊     | 59070/121875 [18:33:55<19:13:54,  1.10s/it]

{'loss': 0.7893, 'learning_rate': 2.5766153846153844e-05, 'epoch': 1.45}


 48%|████▊     | 59100/121875 [18:34:28<19:08:12,  1.10s/it]

{'loss': 0.7956, 'learning_rate': 2.5753846153846157e-05, 'epoch': 1.45}


 49%|████▊     | 59130/121875 [18:35:01<19:12:36,  1.10s/it]

{'loss': 0.7261, 'learning_rate': 2.5741538461538463e-05, 'epoch': 1.46}


 49%|████▊     | 59160/121875 [18:35:34<19:16:12,  1.11s/it]

{'loss': 0.7294, 'learning_rate': 2.5729230769230772e-05, 'epoch': 1.46}


 49%|████▊     | 59190/121875 [18:36:07<19:12:09,  1.10s/it]

{'loss': 0.6881, 'learning_rate': 2.5716923076923078e-05, 'epoch': 1.46}


 49%|████▊     | 59220/121875 [18:36:40<19:10:47,  1.10s/it]

{'loss': 0.6393, 'learning_rate': 2.5704615384615387e-05, 'epoch': 1.46}


 49%|████▊     | 59250/121875 [18:37:13<19:05:29,  1.10s/it]

{'loss': 0.6914, 'learning_rate': 2.5692307692307692e-05, 'epoch': 1.46}


 49%|████▊     | 59280/121875 [18:37:46<19:04:51,  1.10s/it]

{'loss': 0.7177, 'learning_rate': 2.5679999999999998e-05, 'epoch': 1.46}


 49%|████▊     | 59310/121875 [18:38:19<19:06:54,  1.10s/it]

{'loss': 0.6641, 'learning_rate': 2.566769230769231e-05, 'epoch': 1.46}


 49%|████▊     | 59340/121875 [18:38:52<19:03:27,  1.10s/it]

{'loss': 0.7472, 'learning_rate': 2.5655384615384613e-05, 'epoch': 1.46}


 49%|████▊     | 59370/121875 [18:39:25<19:00:39,  1.09s/it]

{'loss': 0.7184, 'learning_rate': 2.5643076923076925e-05, 'epoch': 1.46}


 49%|████▊     | 59400/121875 [18:39:58<19:08:27,  1.10s/it]

{'loss': 0.711, 'learning_rate': 2.563076923076923e-05, 'epoch': 1.46}


 49%|████▉     | 59430/121875 [18:40:31<19:05:06,  1.10s/it]

{'loss': 0.7653, 'learning_rate': 2.561846153846154e-05, 'epoch': 1.46}


 49%|████▉     | 59460/121875 [18:41:04<19:06:56,  1.10s/it]

{'loss': 0.6803, 'learning_rate': 2.5606153846153846e-05, 'epoch': 1.46}


 49%|████▉     | 59490/121875 [18:41:37<19:02:34,  1.10s/it]

{'loss': 0.7621, 'learning_rate': 2.559384615384616e-05, 'epoch': 1.46}


 49%|████▉     | 59520/121875 [18:42:12<19:06:58,  1.10s/it]

{'loss': 0.7335, 'learning_rate': 2.558153846153846e-05, 'epoch': 1.47}


 49%|████▉     | 59550/121875 [18:42:45<19:00:20,  1.10s/it]

{'loss': 0.7141, 'learning_rate': 2.5569230769230773e-05, 'epoch': 1.47}


 49%|████▉     | 59580/121875 [18:43:18<18:56:55,  1.10s/it]

{'loss': 0.753, 'learning_rate': 2.555692307692308e-05, 'epoch': 1.47}


 49%|████▉     | 59610/121875 [18:43:51<19:05:02,  1.10s/it]

{'loss': 0.6473, 'learning_rate': 2.5544615384615385e-05, 'epoch': 1.47}


 49%|████▉     | 59640/121875 [18:44:25<19:00:55,  1.10s/it]

{'loss': 0.6327, 'learning_rate': 2.5532307692307694e-05, 'epoch': 1.47}


 49%|████▉     | 59670/121875 [18:44:58<19:00:22,  1.10s/it]

{'loss': 0.7351, 'learning_rate': 2.552e-05, 'epoch': 1.47}


 49%|████▉     | 59700/121875 [18:45:31<19:03:23,  1.10s/it]

{'loss': 0.6379, 'learning_rate': 2.550769230769231e-05, 'epoch': 1.47}


 49%|████▉     | 59730/121875 [18:46:04<19:01:49,  1.10s/it]

{'loss': 0.6678, 'learning_rate': 2.5495384615384614e-05, 'epoch': 1.47}


 49%|████▉     | 59760/121875 [18:46:37<19:00:24,  1.10s/it]

{'loss': 0.6987, 'learning_rate': 2.5483076923076927e-05, 'epoch': 1.47}


 49%|████▉     | 59790/121875 [18:47:10<18:56:54,  1.10s/it]

{'loss': 0.6845, 'learning_rate': 2.5470769230769233e-05, 'epoch': 1.47}


 49%|████▉     | 59820/121875 [18:47:43<18:54:47,  1.10s/it]

{'loss': 0.7008, 'learning_rate': 2.5458461538461542e-05, 'epoch': 1.47}


 49%|████▉     | 59850/121875 [18:48:16<18:58:43,  1.10s/it]

{'loss': 0.7188, 'learning_rate': 2.5446153846153848e-05, 'epoch': 1.47}


 49%|████▉     | 59880/121875 [18:48:49<18:55:50,  1.10s/it]

{'loss': 0.73, 'learning_rate': 2.5433846153846153e-05, 'epoch': 1.47}


 49%|████▉     | 59910/121875 [18:49:22<18:51:03,  1.10s/it]

{'loss': 0.6864, 'learning_rate': 2.5421538461538462e-05, 'epoch': 1.47}


 49%|████▉     | 59940/121875 [18:49:55<18:58:13,  1.10s/it]

{'loss': 0.7222, 'learning_rate': 2.5409230769230768e-05, 'epoch': 1.48}


 49%|████▉     | 59970/121875 [18:50:28<18:56:34,  1.10s/it]

{'loss': 0.7619, 'learning_rate': 2.539692307692308e-05, 'epoch': 1.48}


 49%|████▉     | 60000/121875 [18:51:01<18:45:49,  1.09s/it]

{'loss': 0.7169, 'learning_rate': 2.5384615384615383e-05, 'epoch': 1.48}


 49%|████▉     | 60030/121875 [18:51:36<18:52:59,  1.10s/it]

{'loss': 0.7099, 'learning_rate': 2.5372307692307695e-05, 'epoch': 1.48}


 49%|████▉     | 60060/121875 [18:52:09<18:52:26,  1.10s/it]

{'loss': 0.7018, 'learning_rate': 2.536e-05, 'epoch': 1.48}


 49%|████▉     | 60090/121875 [18:52:42<18:47:05,  1.09s/it]

{'loss': 0.7066, 'learning_rate': 2.534769230769231e-05, 'epoch': 1.48}


 49%|████▉     | 60120/121875 [18:53:15<18:48:33,  1.10s/it]

{'loss': 0.7336, 'learning_rate': 2.5335384615384616e-05, 'epoch': 1.48}


 49%|████▉     | 60150/121875 [18:53:48<18:49:36,  1.10s/it]

{'loss': 0.6729, 'learning_rate': 2.5323076923076922e-05, 'epoch': 1.48}


 49%|████▉     | 60180/121875 [18:54:21<18:52:59,  1.10s/it]

{'loss': 0.6719, 'learning_rate': 2.531076923076923e-05, 'epoch': 1.48}


 49%|████▉     | 60210/121875 [18:54:54<18:49:01,  1.10s/it]

{'loss': 0.6619, 'learning_rate': 2.5298461538461537e-05, 'epoch': 1.48}


 49%|████▉     | 60240/121875 [18:55:27<18:52:57,  1.10s/it]

{'loss': 0.689, 'learning_rate': 2.528615384615385e-05, 'epoch': 1.48}


 49%|████▉     | 60270/121875 [18:56:00<18:49:29,  1.10s/it]

{'loss': 0.7403, 'learning_rate': 2.5273846153846155e-05, 'epoch': 1.48}


 49%|████▉     | 60300/121875 [18:56:33<18:49:14,  1.10s/it]

{'loss': 0.6826, 'learning_rate': 2.5261538461538464e-05, 'epoch': 1.48}


 50%|████▉     | 60330/121875 [18:57:06<18:50:46,  1.10s/it]

{'loss': 0.7041, 'learning_rate': 2.524923076923077e-05, 'epoch': 1.49}


 50%|████▉     | 60360/121875 [18:57:39<18:49:04,  1.10s/it]

{'loss': 0.7263, 'learning_rate': 2.523692307692308e-05, 'epoch': 1.49}


 50%|████▉     | 60390/121875 [18:58:12<18:45:58,  1.10s/it]

{'loss': 0.735, 'learning_rate': 2.5224615384615384e-05, 'epoch': 1.49}


 50%|████▉     | 60420/121875 [18:58:45<18:48:48,  1.10s/it]

{'loss': 0.6751, 'learning_rate': 2.5212307692307697e-05, 'epoch': 1.49}


 50%|████▉     | 60450/121875 [18:59:18<18:49:23,  1.10s/it]

{'loss': 0.7109, 'learning_rate': 2.5200000000000003e-05, 'epoch': 1.49}


 50%|████▉     | 60480/121875 [18:59:51<18:47:16,  1.10s/it]

{'loss': 0.6863, 'learning_rate': 2.5187692307692305e-05, 'epoch': 1.49}


 50%|████▉     | 60510/121875 [19:00:26<19:13:09,  1.13s/it]

{'loss': 0.6498, 'learning_rate': 2.5175384615384618e-05, 'epoch': 1.49}


 50%|████▉     | 60540/121875 [19:00:59<18:42:57,  1.10s/it]

{'loss': 0.7105, 'learning_rate': 2.5163076923076923e-05, 'epoch': 1.49}


 50%|████▉     | 60570/121875 [19:01:32<18:47:33,  1.10s/it]

{'loss': 0.7074, 'learning_rate': 2.5150769230769232e-05, 'epoch': 1.49}


 50%|████▉     | 60600/121875 [19:02:05<18:41:11,  1.10s/it]

{'loss': 0.711, 'learning_rate': 2.5138461538461538e-05, 'epoch': 1.49}


 50%|████▉     | 60630/121875 [19:02:38<18:41:17,  1.10s/it]

{'loss': 0.6745, 'learning_rate': 2.512615384615385e-05, 'epoch': 1.49}


 50%|████▉     | 60660/121875 [19:03:11<18:46:36,  1.10s/it]

{'loss': 0.7731, 'learning_rate': 2.5113846153846153e-05, 'epoch': 1.49}


 50%|████▉     | 60690/121875 [19:03:44<18:42:43,  1.10s/it]

{'loss': 0.7494, 'learning_rate': 2.5101538461538465e-05, 'epoch': 1.49}


 50%|████▉     | 60720/121875 [19:04:17<18:35:20,  1.09s/it]

{'loss': 0.6758, 'learning_rate': 2.508923076923077e-05, 'epoch': 1.49}


 50%|████▉     | 60750/121875 [19:04:50<18:38:13,  1.10s/it]

{'loss': 0.6472, 'learning_rate': 2.5076923076923077e-05, 'epoch': 1.5}


 50%|████▉     | 60780/121875 [19:05:23<18:39:39,  1.10s/it]

{'loss': 0.7521, 'learning_rate': 2.5064615384615386e-05, 'epoch': 1.5}


 50%|████▉     | 60810/121875 [19:05:56<18:32:30,  1.09s/it]

{'loss': 0.6924, 'learning_rate': 2.5052307692307692e-05, 'epoch': 1.5}


 50%|████▉     | 60840/121875 [19:06:29<18:44:03,  1.10s/it]

{'loss': 0.714, 'learning_rate': 2.504e-05, 'epoch': 1.5}


 50%|████▉     | 60870/121875 [19:07:02<18:38:54,  1.10s/it]

{'loss': 0.6348, 'learning_rate': 2.5027692307692307e-05, 'epoch': 1.5}


 50%|████▉     | 60900/121875 [19:07:35<18:31:31,  1.09s/it]

{'loss': 0.7083, 'learning_rate': 2.501538461538462e-05, 'epoch': 1.5}


 50%|████▉     | 60930/121875 [19:08:08<18:37:16,  1.10s/it]

{'loss': 0.7474, 'learning_rate': 2.5003076923076925e-05, 'epoch': 1.5}


 50%|█████     | 60960/121875 [19:08:41<18:37:28,  1.10s/it]

{'loss': 0.7168, 'learning_rate': 2.499076923076923e-05, 'epoch': 1.5}


 50%|█████     | 60990/121875 [19:09:14<18:37:15,  1.10s/it]

{'loss': 0.7379, 'learning_rate': 2.497846153846154e-05, 'epoch': 1.5}


 50%|█████     | 61020/121875 [19:09:49<18:32:11,  1.10s/it]

{'loss': 0.7368, 'learning_rate': 2.496615384615385e-05, 'epoch': 1.5}


 50%|█████     | 61050/121875 [19:10:22<18:36:52,  1.10s/it]

{'loss': 0.7257, 'learning_rate': 2.4953846153846154e-05, 'epoch': 1.5}


 50%|█████     | 61080/121875 [19:10:55<18:38:07,  1.10s/it]

{'loss': 0.7041, 'learning_rate': 2.4941538461538464e-05, 'epoch': 1.5}


 50%|█████     | 61110/121875 [19:11:28<18:31:55,  1.10s/it]

{'loss': 0.7232, 'learning_rate': 2.4929230769230773e-05, 'epoch': 1.5}


 50%|█████     | 61140/121875 [19:12:01<18:35:14,  1.10s/it]

{'loss': 0.6827, 'learning_rate': 2.491692307692308e-05, 'epoch': 1.5}


 50%|█████     | 61170/121875 [19:12:34<18:36:09,  1.10s/it]

{'loss': 0.6712, 'learning_rate': 2.4904615384615384e-05, 'epoch': 1.51}


 50%|█████     | 61200/121875 [19:13:07<18:36:06,  1.10s/it]

{'loss': 0.7252, 'learning_rate': 2.4892307692307693e-05, 'epoch': 1.51}


 50%|█████     | 61230/121875 [19:13:40<18:29:25,  1.10s/it]

{'loss': 0.6656, 'learning_rate': 2.488e-05, 'epoch': 1.51}


 50%|█████     | 61260/121875 [19:14:13<18:35:19,  1.10s/it]

{'loss': 0.6968, 'learning_rate': 2.4867692307692308e-05, 'epoch': 1.51}


 50%|█████     | 61290/121875 [19:14:46<18:29:18,  1.10s/it]

{'loss': 0.7138, 'learning_rate': 2.4855384615384617e-05, 'epoch': 1.51}


 50%|█████     | 61320/121875 [19:15:19<18:24:54,  1.09s/it]

{'loss': 0.723, 'learning_rate': 2.4843076923076923e-05, 'epoch': 1.51}


 50%|█████     | 61350/121875 [19:15:52<18:28:53,  1.10s/it]

{'loss': 0.7174, 'learning_rate': 2.4830769230769232e-05, 'epoch': 1.51}


 50%|█████     | 61380/121875 [19:16:25<18:33:37,  1.10s/it]

{'loss': 0.6933, 'learning_rate': 2.481846153846154e-05, 'epoch': 1.51}


 50%|█████     | 61410/121875 [19:16:58<18:25:02,  1.10s/it]

{'loss': 0.7395, 'learning_rate': 2.4806153846153847e-05, 'epoch': 1.51}


 50%|█████     | 61440/121875 [19:17:31<18:23:32,  1.10s/it]

{'loss': 0.7034, 'learning_rate': 2.4793846153846156e-05, 'epoch': 1.51}


 50%|█████     | 61470/121875 [19:18:04<18:28:05,  1.10s/it]

{'loss': 0.6863, 'learning_rate': 2.4781538461538462e-05, 'epoch': 1.51}


 50%|█████     | 61500/121875 [19:18:37<18:32:29,  1.11s/it]

{'loss': 0.6874, 'learning_rate': 2.476923076923077e-05, 'epoch': 1.51}


 50%|█████     | 61530/121875 [19:19:12<18:29:54,  1.10s/it]

{'loss': 0.7041, 'learning_rate': 2.4756923076923077e-05, 'epoch': 1.51}


 51%|█████     | 61560/121875 [19:19:45<18:28:20,  1.10s/it]

{'loss': 0.7052, 'learning_rate': 2.4744615384615386e-05, 'epoch': 1.52}


 51%|█████     | 61590/121875 [19:20:18<18:24:02,  1.10s/it]

{'loss': 0.6986, 'learning_rate': 2.4732307692307695e-05, 'epoch': 1.52}


 51%|█████     | 61620/121875 [19:20:51<18:26:35,  1.10s/it]

{'loss': 0.7047, 'learning_rate': 2.472e-05, 'epoch': 1.52}


 51%|█████     | 61650/121875 [19:21:24<18:27:09,  1.10s/it]

{'loss': 0.7502, 'learning_rate': 2.470769230769231e-05, 'epoch': 1.52}


 51%|█████     | 61680/121875 [19:21:57<18:28:14,  1.10s/it]

{'loss': 0.6608, 'learning_rate': 2.469538461538462e-05, 'epoch': 1.52}


 51%|█████     | 61710/121875 [19:22:30<18:23:14,  1.10s/it]

{'loss': 0.6154, 'learning_rate': 2.4683076923076924e-05, 'epoch': 1.52}


 51%|█████     | 61740/121875 [19:23:03<18:20:12,  1.10s/it]

{'loss': 0.7089, 'learning_rate': 2.467076923076923e-05, 'epoch': 1.52}


 51%|█████     | 61770/121875 [19:23:36<18:22:23,  1.10s/it]

{'loss': 0.7597, 'learning_rate': 2.465846153846154e-05, 'epoch': 1.52}


 51%|█████     | 61800/121875 [19:24:09<18:19:43,  1.10s/it]

{'loss': 0.7123, 'learning_rate': 2.4646153846153845e-05, 'epoch': 1.52}


 51%|█████     | 61830/121875 [19:24:42<18:19:55,  1.10s/it]

{'loss': 0.7216, 'learning_rate': 2.4633846153846154e-05, 'epoch': 1.52}


 51%|█████     | 61860/121875 [19:25:15<18:18:07,  1.10s/it]

{'loss': 0.6566, 'learning_rate': 2.4621538461538463e-05, 'epoch': 1.52}


 51%|█████     | 61890/121875 [19:25:48<18:15:46,  1.10s/it]

{'loss': 0.6868, 'learning_rate': 2.460923076923077e-05, 'epoch': 1.52}


 51%|█████     | 61920/121875 [19:26:21<18:23:04,  1.10s/it]

{'loss': 0.715, 'learning_rate': 2.4596923076923078e-05, 'epoch': 1.52}


 51%|█████     | 61950/121875 [19:26:54<18:17:25,  1.10s/it]

{'loss': 0.7389, 'learning_rate': 2.4584615384615387e-05, 'epoch': 1.52}


 51%|█████     | 61980/121875 [19:27:27<18:18:15,  1.10s/it]

{'loss': 0.6724, 'learning_rate': 2.4572307692307693e-05, 'epoch': 1.53}


 51%|█████     | 62010/121875 [19:28:02<18:45:16,  1.13s/it]

{'loss': 0.6856, 'learning_rate': 2.4560000000000002e-05, 'epoch': 1.53}


 51%|█████     | 62040/121875 [19:28:35<18:14:27,  1.10s/it]

{'loss': 0.6629, 'learning_rate': 2.4547692307692308e-05, 'epoch': 1.53}


 51%|█████     | 62070/121875 [19:29:08<18:16:29,  1.10s/it]

{'loss': 0.7363, 'learning_rate': 2.4535384615384617e-05, 'epoch': 1.53}


 51%|█████     | 62100/121875 [19:29:41<18:16:26,  1.10s/it]

{'loss': 0.695, 'learning_rate': 2.4523076923076923e-05, 'epoch': 1.53}


 51%|█████     | 62130/121875 [19:30:14<18:15:12,  1.10s/it]

{'loss': 0.6716, 'learning_rate': 2.4510769230769232e-05, 'epoch': 1.53}


 51%|█████     | 62160/121875 [19:30:47<18:17:11,  1.10s/it]

{'loss': 0.7062, 'learning_rate': 2.449846153846154e-05, 'epoch': 1.53}


 51%|█████     | 62190/121875 [19:31:21<18:19:59,  1.11s/it]

{'loss': 0.7595, 'learning_rate': 2.4486153846153847e-05, 'epoch': 1.53}


 51%|█████     | 62220/121875 [19:31:53<18:11:34,  1.10s/it]

{'loss': 0.693, 'learning_rate': 2.4473846153846156e-05, 'epoch': 1.53}


 51%|█████     | 62250/121875 [19:32:26<18:13:00,  1.10s/it]

{'loss': 0.6714, 'learning_rate': 2.4461538461538465e-05, 'epoch': 1.53}


 51%|█████     | 62280/121875 [19:32:59<18:11:44,  1.10s/it]

{'loss': 0.7516, 'learning_rate': 2.444923076923077e-05, 'epoch': 1.53}


 51%|█████     | 62310/121875 [19:33:32<18:11:54,  1.10s/it]

{'loss': 0.6998, 'learning_rate': 2.443692307692308e-05, 'epoch': 1.53}


 51%|█████     | 62340/121875 [19:34:05<18:10:48,  1.10s/it]

{'loss': 0.6916, 'learning_rate': 2.4424615384615385e-05, 'epoch': 1.53}


 51%|█████     | 62370/121875 [19:34:38<18:12:41,  1.10s/it]

{'loss': 0.7652, 'learning_rate': 2.441230769230769e-05, 'epoch': 1.54}


 51%|█████     | 62400/121875 [19:35:11<18:13:19,  1.10s/it]

{'loss': 0.7363, 'learning_rate': 2.44e-05, 'epoch': 1.54}


 51%|█████     | 62430/121875 [19:35:45<18:14:33,  1.10s/it]

{'loss': 0.6488, 'learning_rate': 2.438769230769231e-05, 'epoch': 1.54}


 51%|█████     | 62460/121875 [19:36:18<17:56:53,  1.09s/it]

{'loss': 0.6771, 'learning_rate': 2.4375384615384615e-05, 'epoch': 1.54}


 51%|█████▏    | 62490/121875 [19:36:51<18:12:09,  1.10s/it]

{'loss': 0.7124, 'learning_rate': 2.4363076923076924e-05, 'epoch': 1.54}


 51%|█████▏    | 62520/121875 [19:37:26<18:07:35,  1.10s/it]

{'loss': 0.6759, 'learning_rate': 2.4350769230769233e-05, 'epoch': 1.54}


 51%|█████▏    | 62550/121875 [19:37:59<17:57:05,  1.09s/it]

{'loss': 0.7011, 'learning_rate': 2.433846153846154e-05, 'epoch': 1.54}


 51%|█████▏    | 62580/121875 [19:38:32<18:11:30,  1.10s/it]

{'loss': 0.6974, 'learning_rate': 2.4326153846153848e-05, 'epoch': 1.54}


 51%|█████▏    | 62610/121875 [19:39:05<18:02:27,  1.10s/it]

{'loss': 0.7576, 'learning_rate': 2.4313846153846154e-05, 'epoch': 1.54}


 51%|█████▏    | 62640/121875 [19:39:38<18:04:43,  1.10s/it]

{'loss': 0.6671, 'learning_rate': 2.4301538461538463e-05, 'epoch': 1.54}


 51%|█████▏    | 62670/121875 [19:40:11<18:06:46,  1.10s/it]

{'loss': 0.715, 'learning_rate': 2.428923076923077e-05, 'epoch': 1.54}


 51%|█████▏    | 62700/121875 [19:40:44<17:52:57,  1.09s/it]

{'loss': 0.6831, 'learning_rate': 2.4276923076923078e-05, 'epoch': 1.54}


 51%|█████▏    | 62730/121875 [19:41:17<18:01:52,  1.10s/it]

{'loss': 0.6308, 'learning_rate': 2.4264615384615387e-05, 'epoch': 1.54}


 51%|█████▏    | 62760/121875 [19:41:50<18:01:36,  1.10s/it]

{'loss': 0.7325, 'learning_rate': 2.4252307692307693e-05, 'epoch': 1.54}


 52%|█████▏    | 62790/121875 [19:42:23<17:54:58,  1.09s/it]

{'loss': 0.7221, 'learning_rate': 2.4240000000000002e-05, 'epoch': 1.55}


 52%|█████▏    | 62820/121875 [19:42:56<18:06:29,  1.10s/it]

{'loss': 0.7191, 'learning_rate': 2.422769230769231e-05, 'epoch': 1.55}


 52%|█████▏    | 62850/121875 [19:43:29<17:59:36,  1.10s/it]

{'loss': 0.6795, 'learning_rate': 2.4215384615384617e-05, 'epoch': 1.55}


 52%|█████▏    | 62880/121875 [19:44:01<18:05:23,  1.10s/it]

{'loss': 0.7453, 'learning_rate': 2.4203076923076926e-05, 'epoch': 1.55}


 52%|█████▏    | 62910/121875 [19:44:34<18:06:42,  1.11s/it]

{'loss': 0.6944, 'learning_rate': 2.419076923076923e-05, 'epoch': 1.55}


 52%|█████▏    | 62940/121875 [19:45:08<17:59:44,  1.10s/it]

{'loss': 0.7202, 'learning_rate': 2.4178461538461537e-05, 'epoch': 1.55}


 52%|█████▏    | 62970/121875 [19:45:41<18:02:39,  1.10s/it]

{'loss': 0.6855, 'learning_rate': 2.4166153846153846e-05, 'epoch': 1.55}


 52%|█████▏    | 63000/121875 [19:46:14<18:02:06,  1.10s/it]

{'loss': 0.7444, 'learning_rate': 2.4153846153846155e-05, 'epoch': 1.55}


 52%|█████▏    | 63030/121875 [19:46:49<18:01:44,  1.10s/it]

{'loss': 0.6725, 'learning_rate': 2.414153846153846e-05, 'epoch': 1.55}


 52%|█████▏    | 63060/121875 [19:47:22<17:58:26,  1.10s/it]

{'loss': 0.6894, 'learning_rate': 2.412923076923077e-05, 'epoch': 1.55}


 52%|█████▏    | 63090/121875 [19:47:55<17:56:57,  1.10s/it]

{'loss': 0.7004, 'learning_rate': 2.411692307692308e-05, 'epoch': 1.55}


 52%|█████▏    | 63120/121875 [19:48:28<17:55:54,  1.10s/it]

{'loss': 0.7781, 'learning_rate': 2.4104615384615385e-05, 'epoch': 1.55}


 52%|█████▏    | 63150/121875 [19:49:01<17:53:04,  1.10s/it]

{'loss': 0.6955, 'learning_rate': 2.4092307692307694e-05, 'epoch': 1.55}


 52%|█████▏    | 63180/121875 [19:49:34<17:59:13,  1.10s/it]

{'loss': 0.694, 'learning_rate': 2.408e-05, 'epoch': 1.56}


 52%|█████▏    | 63210/121875 [19:50:07<17:53:17,  1.10s/it]

{'loss': 0.7138, 'learning_rate': 2.406769230769231e-05, 'epoch': 1.56}


 52%|█████▏    | 63240/121875 [19:50:40<17:47:35,  1.09s/it]

{'loss': 0.6769, 'learning_rate': 2.4055384615384615e-05, 'epoch': 1.56}


 52%|█████▏    | 63270/121875 [19:51:13<17:54:56,  1.10s/it]

{'loss': 0.6753, 'learning_rate': 2.4043076923076924e-05, 'epoch': 1.56}


 52%|█████▏    | 63300/121875 [19:51:46<17:51:16,  1.10s/it]

{'loss': 0.6881, 'learning_rate': 2.4030769230769233e-05, 'epoch': 1.56}


 52%|█████▏    | 63330/121875 [19:52:19<17:49:23,  1.10s/it]

{'loss': 0.678, 'learning_rate': 2.401846153846154e-05, 'epoch': 1.56}


 52%|█████▏    | 63360/121875 [19:52:52<17:51:15,  1.10s/it]

{'loss': 0.7466, 'learning_rate': 2.4006153846153848e-05, 'epoch': 1.56}


 52%|█████▏    | 63390/121875 [19:53:25<17:56:42,  1.10s/it]

{'loss': 0.6602, 'learning_rate': 2.3993846153846157e-05, 'epoch': 1.56}


 52%|█████▏    | 63420/121875 [19:53:58<17:52:01,  1.10s/it]

{'loss': 0.664, 'learning_rate': 2.3981538461538463e-05, 'epoch': 1.56}


 52%|█████▏    | 63450/121875 [19:54:30<17:46:51,  1.10s/it]

{'loss': 0.6964, 'learning_rate': 2.3969230769230772e-05, 'epoch': 1.56}


 52%|█████▏    | 63480/121875 [19:55:03<17:47:37,  1.10s/it]

{'loss': 0.7668, 'learning_rate': 2.3956923076923077e-05, 'epoch': 1.56}


 52%|█████▏    | 63510/121875 [19:55:39<18:12:43,  1.12s/it]

{'loss': 0.7345, 'learning_rate': 2.3944615384615383e-05, 'epoch': 1.56}


 52%|█████▏    | 63540/121875 [19:56:12<17:50:07,  1.10s/it]

{'loss': 0.7162, 'learning_rate': 2.3932307692307692e-05, 'epoch': 1.56}


 52%|█████▏    | 63570/121875 [19:56:45<17:49:51,  1.10s/it]

{'loss': 0.7322, 'learning_rate': 2.392e-05, 'epoch': 1.56}


 52%|█████▏    | 63600/121875 [19:57:18<17:42:34,  1.09s/it]

{'loss': 0.7001, 'learning_rate': 2.3907692307692307e-05, 'epoch': 1.57}


 52%|█████▏    | 63630/121875 [19:57:50<17:44:31,  1.10s/it]

{'loss': 0.6966, 'learning_rate': 2.3895384615384616e-05, 'epoch': 1.57}


 52%|█████▏    | 63660/121875 [19:58:23<17:44:01,  1.10s/it]

{'loss': 0.7269, 'learning_rate': 2.3883076923076925e-05, 'epoch': 1.57}


 52%|█████▏    | 63690/121875 [19:58:56<17:43:33,  1.10s/it]

{'loss': 0.7528, 'learning_rate': 2.387076923076923e-05, 'epoch': 1.57}


 52%|█████▏    | 63720/121875 [19:59:29<17:49:35,  1.10s/it]

{'loss': 0.6769, 'learning_rate': 2.385846153846154e-05, 'epoch': 1.57}


 52%|█████▏    | 63750/121875 [20:00:02<17:49:05,  1.10s/it]

{'loss': 0.7163, 'learning_rate': 2.384615384615385e-05, 'epoch': 1.57}


 52%|█████▏    | 63780/121875 [20:00:35<17:47:28,  1.10s/it]

{'loss': 0.7473, 'learning_rate': 2.3833846153846155e-05, 'epoch': 1.57}


 52%|█████▏    | 63810/121875 [20:01:08<17:36:13,  1.09s/it]

{'loss': 0.6854, 'learning_rate': 2.382153846153846e-05, 'epoch': 1.57}


 52%|█████▏    | 63840/121875 [20:01:41<17:39:45,  1.10s/it]

{'loss': 0.6656, 'learning_rate': 2.380923076923077e-05, 'epoch': 1.57}


 52%|█████▏    | 63870/121875 [20:02:14<17:44:44,  1.10s/it]

{'loss': 0.7785, 'learning_rate': 2.379692307692308e-05, 'epoch': 1.57}


 52%|█████▏    | 63900/121875 [20:02:47<17:41:36,  1.10s/it]

{'loss': 0.7046, 'learning_rate': 2.3784615384615385e-05, 'epoch': 1.57}


 52%|█████▏    | 63930/121875 [20:03:20<17:37:05,  1.09s/it]

{'loss': 0.7405, 'learning_rate': 2.3772307692307694e-05, 'epoch': 1.57}


 52%|█████▏    | 63960/121875 [20:03:53<17:39:46,  1.10s/it]

{'loss': 0.7016, 'learning_rate': 2.3760000000000003e-05, 'epoch': 1.57}


 53%|█████▎    | 63990/121875 [20:04:26<17:39:14,  1.10s/it]

{'loss': 0.6377, 'learning_rate': 2.374769230769231e-05, 'epoch': 1.58}


 53%|█████▎    | 64020/121875 [20:05:02<17:46:01,  1.11s/it]

{'loss': 0.7318, 'learning_rate': 2.3735384615384618e-05, 'epoch': 1.58}


 53%|█████▎    | 64050/121875 [20:05:35<17:41:00,  1.10s/it]

{'loss': 0.7123, 'learning_rate': 2.3723076923076923e-05, 'epoch': 1.58}


 53%|█████▎    | 64080/121875 [20:06:07<17:34:10,  1.09s/it]

{'loss': 0.7205, 'learning_rate': 2.371076923076923e-05, 'epoch': 1.58}


 53%|█████▎    | 64110/121875 [20:06:41<17:41:44,  1.10s/it]

{'loss': 0.6614, 'learning_rate': 2.369846153846154e-05, 'epoch': 1.58}


 53%|█████▎    | 64140/121875 [20:07:14<17:34:04,  1.10s/it]

{'loss': 0.763, 'learning_rate': 2.3686153846153847e-05, 'epoch': 1.58}


 53%|█████▎    | 64170/121875 [20:07:46<17:35:27,  1.10s/it]

{'loss': 0.7165, 'learning_rate': 2.3673846153846153e-05, 'epoch': 1.58}


 53%|█████▎    | 64200/121875 [20:08:19<17:36:41,  1.10s/it]

{'loss': 0.6879, 'learning_rate': 2.3661538461538462e-05, 'epoch': 1.58}


 53%|█████▎    | 64230/121875 [20:08:52<17:33:46,  1.10s/it]

{'loss': 0.8193, 'learning_rate': 2.364923076923077e-05, 'epoch': 1.58}


 53%|█████▎    | 64260/121875 [20:09:25<17:42:38,  1.11s/it]

{'loss': 0.7253, 'learning_rate': 2.3636923076923077e-05, 'epoch': 1.58}


 53%|█████▎    | 64290/121875 [20:09:58<17:35:10,  1.10s/it]

{'loss': 0.6787, 'learning_rate': 2.3624615384615386e-05, 'epoch': 1.58}


 53%|█████▎    | 64320/121875 [20:10:31<17:37:55,  1.10s/it]

{'loss': 0.7471, 'learning_rate': 2.3612307692307695e-05, 'epoch': 1.58}


 53%|█████▎    | 64350/121875 [20:11:04<17:35:33,  1.10s/it]

{'loss': 0.6989, 'learning_rate': 2.36e-05, 'epoch': 1.58}


 53%|█████▎    | 64380/121875 [20:11:37<17:33:39,  1.10s/it]

{'loss': 0.7471, 'learning_rate': 2.3587692307692307e-05, 'epoch': 1.58}


 53%|█████▎    | 64410/121875 [20:12:10<17:32:22,  1.10s/it]

{'loss': 0.7028, 'learning_rate': 2.3575384615384616e-05, 'epoch': 1.59}


 53%|█████▎    | 64440/121875 [20:12:43<17:23:08,  1.09s/it]

{'loss': 0.6561, 'learning_rate': 2.3563076923076925e-05, 'epoch': 1.59}


 53%|█████▎    | 64470/121875 [20:13:16<17:33:57,  1.10s/it]

{'loss': 0.7039, 'learning_rate': 2.355076923076923e-05, 'epoch': 1.59}


 53%|█████▎    | 64500/121875 [20:13:49<17:30:13,  1.10s/it]

{'loss': 0.714, 'learning_rate': 2.353846153846154e-05, 'epoch': 1.59}


 53%|█████▎    | 64530/121875 [20:14:24<17:20:58,  1.09s/it]

{'loss': 0.7614, 'learning_rate': 2.352615384615385e-05, 'epoch': 1.59}


 53%|█████▎    | 64560/121875 [20:14:57<17:33:52,  1.10s/it]

{'loss': 0.7222, 'learning_rate': 2.3513846153846155e-05, 'epoch': 1.59}


 53%|█████▎    | 64590/121875 [20:15:30<17:30:52,  1.10s/it]

{'loss': 0.7689, 'learning_rate': 2.3501538461538464e-05, 'epoch': 1.59}


 53%|█████▎    | 64620/121875 [20:16:03<17:23:28,  1.09s/it]

{'loss': 0.7283, 'learning_rate': 2.3489230769230773e-05, 'epoch': 1.59}


 53%|█████▎    | 64650/121875 [20:16:36<17:30:18,  1.10s/it]

{'loss': 0.6628, 'learning_rate': 2.3476923076923075e-05, 'epoch': 1.59}


 53%|█████▎    | 64680/121875 [20:17:09<17:21:13,  1.09s/it]

{'loss': 0.7145, 'learning_rate': 2.3464615384615384e-05, 'epoch': 1.59}


 53%|█████▎    | 64710/121875 [20:17:42<17:31:57,  1.10s/it]

{'loss': 0.6066, 'learning_rate': 2.3452307692307693e-05, 'epoch': 1.59}


 53%|█████▎    | 64740/121875 [20:18:15<17:25:02,  1.10s/it]

{'loss': 0.6687, 'learning_rate': 2.344e-05, 'epoch': 1.59}


 53%|█████▎    | 64770/121875 [20:18:48<17:23:49,  1.10s/it]

{'loss': 0.7056, 'learning_rate': 2.342769230769231e-05, 'epoch': 1.59}


 53%|█████▎    | 64800/121875 [20:19:21<17:26:06,  1.10s/it]

{'loss': 0.7189, 'learning_rate': 2.3415384615384617e-05, 'epoch': 1.6}


 53%|█████▎    | 64830/121875 [20:19:54<17:22:12,  1.10s/it]

{'loss': 0.6761, 'learning_rate': 2.3403076923076923e-05, 'epoch': 1.6}


 53%|█████▎    | 64860/121875 [20:20:27<17:23:34,  1.10s/it]

{'loss': 0.6694, 'learning_rate': 2.3390769230769232e-05, 'epoch': 1.6}


 53%|█████▎    | 64890/121875 [20:21:00<17:24:33,  1.10s/it]

{'loss': 0.7376, 'learning_rate': 2.337846153846154e-05, 'epoch': 1.6}


 53%|█████▎    | 64920/121875 [20:21:33<17:28:54,  1.10s/it]

{'loss': 0.7263, 'learning_rate': 2.3366153846153847e-05, 'epoch': 1.6}


 53%|█████▎    | 64950/121875 [20:22:06<17:27:01,  1.10s/it]

{'loss': 0.7251, 'learning_rate': 2.3353846153846153e-05, 'epoch': 1.6}


 53%|█████▎    | 64980/121875 [20:22:39<17:23:39,  1.10s/it]

{'loss': 0.6385, 'learning_rate': 2.3341538461538462e-05, 'epoch': 1.6}


 53%|█████▎    | 65010/121875 [20:23:14<17:53:11,  1.13s/it]

{'loss': 0.6921, 'learning_rate': 2.332923076923077e-05, 'epoch': 1.6}


 53%|█████▎    | 65040/121875 [20:23:47<17:26:33,  1.10s/it]

{'loss': 0.6818, 'learning_rate': 2.3316923076923077e-05, 'epoch': 1.6}


 53%|█████▎    | 65070/121875 [20:24:20<17:21:51,  1.10s/it]

{'loss': 0.7012, 'learning_rate': 2.3304615384615386e-05, 'epoch': 1.6}


 53%|█████▎    | 65100/121875 [20:24:53<17:18:23,  1.10s/it]

{'loss': 0.6989, 'learning_rate': 2.3292307692307695e-05, 'epoch': 1.6}


 53%|█████▎    | 65130/121875 [20:25:26<17:22:41,  1.10s/it]

{'loss': 0.7375, 'learning_rate': 2.328e-05, 'epoch': 1.6}


 53%|█████▎    | 65160/121875 [20:25:59<17:18:09,  1.10s/it]

{'loss': 0.6608, 'learning_rate': 2.326769230769231e-05, 'epoch': 1.6}


 53%|█████▎    | 65190/121875 [20:26:32<17:19:58,  1.10s/it]

{'loss': 0.6699, 'learning_rate': 2.325538461538462e-05, 'epoch': 1.6}


 54%|█████▎    | 65220/121875 [20:27:05<17:08:55,  1.09s/it]

{'loss': 0.7322, 'learning_rate': 2.324307692307692e-05, 'epoch': 1.61}


 54%|█████▎    | 65250/121875 [20:27:38<17:15:25,  1.10s/it]

{'loss': 0.7252, 'learning_rate': 2.323076923076923e-05, 'epoch': 1.61}


 54%|█████▎    | 65280/121875 [20:28:11<17:10:45,  1.09s/it]

{'loss': 0.7139, 'learning_rate': 2.321846153846154e-05, 'epoch': 1.61}


 54%|█████▎    | 65310/121875 [20:28:44<17:23:00,  1.11s/it]

{'loss': 0.6713, 'learning_rate': 2.3206153846153845e-05, 'epoch': 1.61}


 54%|█████▎    | 65340/121875 [20:29:17<17:15:00,  1.10s/it]

{'loss': 0.6828, 'learning_rate': 2.3193846153846154e-05, 'epoch': 1.61}


 54%|█████▎    | 65370/121875 [20:29:50<17:15:24,  1.10s/it]

{'loss': 0.6413, 'learning_rate': 2.3181538461538463e-05, 'epoch': 1.61}


 54%|█████▎    | 65400/121875 [20:30:23<17:15:32,  1.10s/it]

{'loss': 0.7202, 'learning_rate': 2.316923076923077e-05, 'epoch': 1.61}


 54%|█████▎    | 65430/121875 [20:30:56<17:16:00,  1.10s/it]

{'loss': 0.6768, 'learning_rate': 2.3156923076923078e-05, 'epoch': 1.61}


 54%|█████▎    | 65460/121875 [20:31:29<17:14:19,  1.10s/it]

{'loss': 0.7281, 'learning_rate': 2.3144615384615387e-05, 'epoch': 1.61}


 54%|█████▎    | 65490/121875 [20:32:02<17:14:18,  1.10s/it]

{'loss': 0.6417, 'learning_rate': 2.3132307692307693e-05, 'epoch': 1.61}


 54%|█████▍    | 65520/121875 [20:32:37<17:12:07,  1.10s/it]

{'loss': 0.6844, 'learning_rate': 2.312e-05, 'epoch': 1.61}


 54%|█████▍    | 65550/121875 [20:33:10<17:08:43,  1.10s/it]

{'loss': 0.691, 'learning_rate': 2.3107692307692308e-05, 'epoch': 1.61}


 54%|█████▍    | 65580/121875 [20:33:43<17:04:59,  1.09s/it]

{'loss': 0.7858, 'learning_rate': 2.3095384615384617e-05, 'epoch': 1.61}


 54%|█████▍    | 65610/121875 [20:34:16<17:13:42,  1.10s/it]

{'loss': 0.6695, 'learning_rate': 2.3083076923076923e-05, 'epoch': 1.62}


 54%|█████▍    | 65640/121875 [20:34:49<17:09:23,  1.10s/it]

{'loss': 0.6292, 'learning_rate': 2.3070769230769232e-05, 'epoch': 1.62}


 54%|█████▍    | 65670/121875 [20:35:22<17:08:57,  1.10s/it]

{'loss': 0.7078, 'learning_rate': 2.305846153846154e-05, 'epoch': 1.62}


 54%|█████▍    | 65700/121875 [20:35:55<17:07:54,  1.10s/it]

{'loss': 0.6571, 'learning_rate': 2.3046153846153847e-05, 'epoch': 1.62}


 54%|█████▍    | 65730/121875 [20:36:28<17:04:47,  1.10s/it]

{'loss': 0.6717, 'learning_rate': 2.3033846153846156e-05, 'epoch': 1.62}


 54%|█████▍    | 65760/121875 [20:37:01<17:11:49,  1.10s/it]

{'loss': 0.6816, 'learning_rate': 2.3021538461538465e-05, 'epoch': 1.62}


 54%|█████▍    | 65790/121875 [20:37:34<17:06:48,  1.10s/it]

{'loss': 0.7598, 'learning_rate': 2.300923076923077e-05, 'epoch': 1.62}


 54%|█████▍    | 65820/121875 [20:38:07<17:08:14,  1.10s/it]

{'loss': 0.669, 'learning_rate': 2.2996923076923076e-05, 'epoch': 1.62}


 54%|█████▍    | 65850/121875 [20:38:40<17:05:52,  1.10s/it]

{'loss': 0.7293, 'learning_rate': 2.2984615384615386e-05, 'epoch': 1.62}


 54%|█████▍    | 65880/121875 [20:39:13<17:07:23,  1.10s/it]

{'loss': 0.7366, 'learning_rate': 2.297230769230769e-05, 'epoch': 1.62}


 54%|█████▍    | 65910/121875 [20:39:46<17:08:09,  1.10s/it]

{'loss': 0.7271, 'learning_rate': 2.296e-05, 'epoch': 1.62}


 54%|█████▍    | 65940/121875 [20:40:19<17:06:06,  1.10s/it]

{'loss': 0.6821, 'learning_rate': 2.294769230769231e-05, 'epoch': 1.62}


 54%|█████▍    | 65970/121875 [20:40:52<17:00:19,  1.10s/it]

{'loss': 0.6531, 'learning_rate': 2.2935384615384615e-05, 'epoch': 1.62}


 54%|█████▍    | 66000/121875 [20:41:25<17:04:17,  1.10s/it]

{'loss': 0.7518, 'learning_rate': 2.2923076923076924e-05, 'epoch': 1.62}


 54%|█████▍    | 66030/121875 [20:42:00<17:02:44,  1.10s/it]

{'loss': 0.7444, 'learning_rate': 2.2910769230769233e-05, 'epoch': 1.63}


 54%|█████▍    | 66060/121875 [20:42:33<17:00:50,  1.10s/it]

{'loss': 0.7658, 'learning_rate': 2.289846153846154e-05, 'epoch': 1.63}


 54%|█████▍    | 66090/121875 [20:43:06<17:04:37,  1.10s/it]

{'loss': 0.7069, 'learning_rate': 2.2886153846153845e-05, 'epoch': 1.63}


 54%|█████▍    | 66120/121875 [20:43:39<16:58:52,  1.10s/it]

{'loss': 0.7141, 'learning_rate': 2.2873846153846154e-05, 'epoch': 1.63}


 54%|█████▍    | 66150/121875 [20:44:12<16:52:09,  1.09s/it]

{'loss': 0.6703, 'learning_rate': 2.2861538461538463e-05, 'epoch': 1.63}


 54%|█████▍    | 66180/121875 [20:44:45<17:01:14,  1.10s/it]

{'loss': 0.771, 'learning_rate': 2.284923076923077e-05, 'epoch': 1.63}


 54%|█████▍    | 66210/121875 [20:45:18<16:57:32,  1.10s/it]

{'loss': 0.7028, 'learning_rate': 2.2836923076923078e-05, 'epoch': 1.63}


 54%|█████▍    | 66240/121875 [20:45:51<17:00:44,  1.10s/it]

{'loss': 0.714, 'learning_rate': 2.2824615384615387e-05, 'epoch': 1.63}


 54%|█████▍    | 66270/121875 [20:46:24<16:54:11,  1.09s/it]

{'loss': 0.6518, 'learning_rate': 2.2812307692307693e-05, 'epoch': 1.63}


 54%|█████▍    | 66300/121875 [20:46:57<17:01:00,  1.10s/it]

{'loss': 0.7571, 'learning_rate': 2.2800000000000002e-05, 'epoch': 1.63}


 54%|█████▍    | 66330/121875 [20:47:30<16:57:50,  1.10s/it]

{'loss': 0.6756, 'learning_rate': 2.278769230769231e-05, 'epoch': 1.63}


 54%|█████▍    | 66360/121875 [20:48:03<16:59:18,  1.10s/it]

{'loss': 0.7107, 'learning_rate': 2.2775384615384617e-05, 'epoch': 1.63}


 54%|█████▍    | 66390/121875 [20:48:36<16:58:22,  1.10s/it]

{'loss': 0.7664, 'learning_rate': 2.2763076923076922e-05, 'epoch': 1.63}


 54%|█████▍    | 66420/121875 [20:49:09<17:00:37,  1.10s/it]

{'loss': 0.672, 'learning_rate': 2.275076923076923e-05, 'epoch': 1.63}


 55%|█████▍    | 66450/121875 [20:49:42<16:53:33,  1.10s/it]

{'loss': 0.6763, 'learning_rate': 2.273846153846154e-05, 'epoch': 1.64}


 55%|█████▍    | 66480/121875 [20:50:15<16:57:50,  1.10s/it]

{'loss': 0.7512, 'learning_rate': 2.2726153846153846e-05, 'epoch': 1.64}


 55%|█████▍    | 66510/121875 [20:50:50<17:18:59,  1.13s/it]

{'loss': 0.7107, 'learning_rate': 2.2713846153846156e-05, 'epoch': 1.64}


 55%|█████▍    | 66540/121875 [20:51:23<17:00:07,  1.11s/it]

{'loss': 0.7053, 'learning_rate': 2.270153846153846e-05, 'epoch': 1.64}


 55%|█████▍    | 66570/121875 [20:51:56<16:51:17,  1.10s/it]

{'loss': 0.7381, 'learning_rate': 2.268923076923077e-05, 'epoch': 1.64}


 55%|█████▍    | 66600/121875 [20:52:29<16:50:59,  1.10s/it]

{'loss': 0.6989, 'learning_rate': 2.267692307692308e-05, 'epoch': 1.64}


 55%|█████▍    | 66630/121875 [20:53:02<16:56:29,  1.10s/it]

{'loss': 0.714, 'learning_rate': 2.2664615384615385e-05, 'epoch': 1.64}


 55%|█████▍    | 66660/121875 [20:53:35<16:54:51,  1.10s/it]

{'loss': 0.7033, 'learning_rate': 2.265230769230769e-05, 'epoch': 1.64}


 55%|█████▍    | 66690/121875 [20:54:08<16:51:19,  1.10s/it]

{'loss': 0.7379, 'learning_rate': 2.264e-05, 'epoch': 1.64}


 55%|█████▍    | 66720/121875 [20:54:41<16:54:02,  1.10s/it]

{'loss': 0.6349, 'learning_rate': 2.262769230769231e-05, 'epoch': 1.64}


 55%|█████▍    | 66750/121875 [20:55:14<16:48:41,  1.10s/it]

{'loss': 0.6506, 'learning_rate': 2.2615384615384615e-05, 'epoch': 1.64}


 55%|█████▍    | 66780/121875 [20:55:47<16:51:40,  1.10s/it]

{'loss': 0.6542, 'learning_rate': 2.2603076923076924e-05, 'epoch': 1.64}


 55%|█████▍    | 66810/121875 [20:56:20<16:51:35,  1.10s/it]

{'loss': 0.6553, 'learning_rate': 2.2590769230769233e-05, 'epoch': 1.64}


 55%|█████▍    | 66840/121875 [20:56:54<16:50:31,  1.10s/it]

{'loss': 0.6868, 'learning_rate': 2.257846153846154e-05, 'epoch': 1.65}


 55%|█████▍    | 66870/121875 [20:57:27<16:47:39,  1.10s/it]

{'loss': 0.726, 'learning_rate': 2.2566153846153848e-05, 'epoch': 1.65}


 55%|█████▍    | 66900/121875 [20:58:00<16:47:54,  1.10s/it]

{'loss': 0.7409, 'learning_rate': 2.2553846153846157e-05, 'epoch': 1.65}


 55%|█████▍    | 66930/121875 [20:58:33<16:43:06,  1.10s/it]

{'loss': 0.6425, 'learning_rate': 2.2541538461538463e-05, 'epoch': 1.65}


 55%|█████▍    | 66960/121875 [20:59:06<16:43:42,  1.10s/it]

{'loss': 0.7365, 'learning_rate': 2.252923076923077e-05, 'epoch': 1.65}


 55%|█████▍    | 66990/121875 [20:59:39<16:48:51,  1.10s/it]

{'loss': 0.6974, 'learning_rate': 2.2516923076923078e-05, 'epoch': 1.65}


 55%|█████▍    | 67020/121875 [21:00:14<16:46:28,  1.10s/it]

{'loss': 0.703, 'learning_rate': 2.2504615384615387e-05, 'epoch': 1.65}


 55%|█████▌    | 67050/121875 [21:00:47<16:48:06,  1.10s/it]

{'loss': 0.7422, 'learning_rate': 2.2492307692307692e-05, 'epoch': 1.65}


 55%|█████▌    | 67080/121875 [21:01:20<16:37:21,  1.09s/it]

{'loss': 0.6453, 'learning_rate': 2.248e-05, 'epoch': 1.65}


 55%|█████▌    | 67110/121875 [21:01:53<16:44:05,  1.10s/it]

{'loss': 0.74, 'learning_rate': 2.246769230769231e-05, 'epoch': 1.65}


 55%|█████▌    | 67140/121875 [21:02:26<16:42:00,  1.10s/it]

{'loss': 0.705, 'learning_rate': 2.2455384615384616e-05, 'epoch': 1.65}


 55%|█████▌    | 67170/121875 [21:02:59<16:43:09,  1.10s/it]

{'loss': 0.6719, 'learning_rate': 2.2443076923076926e-05, 'epoch': 1.65}


 55%|█████▌    | 67200/121875 [21:03:32<16:45:36,  1.10s/it]

{'loss': 0.664, 'learning_rate': 2.243076923076923e-05, 'epoch': 1.65}


 55%|█████▌    | 67230/121875 [21:04:05<16:44:45,  1.10s/it]

{'loss': 0.664, 'learning_rate': 2.2418461538461537e-05, 'epoch': 1.65}


 55%|█████▌    | 67260/121875 [21:04:38<16:44:12,  1.10s/it]

{'loss': 0.6241, 'learning_rate': 2.2406153846153846e-05, 'epoch': 1.66}


 55%|█████▌    | 67290/121875 [21:05:11<16:41:12,  1.10s/it]

{'loss': 0.6956, 'learning_rate': 2.2393846153846155e-05, 'epoch': 1.66}


 55%|█████▌    | 67320/121875 [21:05:44<16:37:33,  1.10s/it]

{'loss': 0.7223, 'learning_rate': 2.238153846153846e-05, 'epoch': 1.66}


 55%|█████▌    | 67350/121875 [21:06:17<16:41:07,  1.10s/it]

{'loss': 0.7614, 'learning_rate': 2.236923076923077e-05, 'epoch': 1.66}


 55%|█████▌    | 67380/121875 [21:06:50<16:36:02,  1.10s/it]

{'loss': 0.6847, 'learning_rate': 2.235692307692308e-05, 'epoch': 1.66}


 55%|█████▌    | 67410/121875 [21:07:23<16:33:13,  1.09s/it]

{'loss': 0.6602, 'learning_rate': 2.2344615384615385e-05, 'epoch': 1.66}


 55%|█████▌    | 67440/121875 [21:07:56<16:39:36,  1.10s/it]

{'loss': 0.6621, 'learning_rate': 2.2332307692307694e-05, 'epoch': 1.66}


 55%|█████▌    | 67470/121875 [21:08:29<16:37:43,  1.10s/it]

{'loss': 0.7253, 'learning_rate': 2.2320000000000003e-05, 'epoch': 1.66}


 55%|█████▌    | 67500/121875 [21:09:02<16:37:42,  1.10s/it]

{'loss': 0.7063, 'learning_rate': 2.230769230769231e-05, 'epoch': 1.66}


 55%|█████▌    | 67530/121875 [21:09:37<16:36:15,  1.10s/it]

{'loss': 0.6944, 'learning_rate': 2.2295384615384615e-05, 'epoch': 1.66}


 55%|█████▌    | 67560/121875 [21:10:10<16:31:05,  1.09s/it]

{'loss': 0.7147, 'learning_rate': 2.2283076923076924e-05, 'epoch': 1.66}


 55%|█████▌    | 67590/121875 [21:10:43<16:37:58,  1.10s/it]

{'loss': 0.7017, 'learning_rate': 2.2270769230769233e-05, 'epoch': 1.66}


 55%|█████▌    | 67620/121875 [21:11:16<16:37:00,  1.10s/it]

{'loss': 0.7498, 'learning_rate': 2.225846153846154e-05, 'epoch': 1.66}


 56%|█████▌    | 67650/121875 [21:11:49<16:33:23,  1.10s/it]

{'loss': 0.7663, 'learning_rate': 2.2246153846153848e-05, 'epoch': 1.67}


 56%|█████▌    | 67680/121875 [21:12:22<16:36:09,  1.10s/it]

{'loss': 0.7018, 'learning_rate': 2.2233846153846157e-05, 'epoch': 1.67}


 56%|█████▌    | 67710/121875 [21:12:55<16:26:46,  1.09s/it]

{'loss': 0.6924, 'learning_rate': 2.2221538461538462e-05, 'epoch': 1.67}


 56%|█████▌    | 67740/121875 [21:13:29<16:33:21,  1.10s/it]

{'loss': 0.6894, 'learning_rate': 2.220923076923077e-05, 'epoch': 1.67}


 56%|█████▌    | 67770/121875 [21:14:02<16:33:14,  1.10s/it]

{'loss': 0.7134, 'learning_rate': 2.219692307692308e-05, 'epoch': 1.67}


 56%|█████▌    | 67800/121875 [21:14:35<16:35:19,  1.10s/it]

{'loss': 0.7465, 'learning_rate': 2.2184615384615386e-05, 'epoch': 1.67}


 56%|█████▌    | 67830/121875 [21:15:08<16:29:11,  1.10s/it]

{'loss': 0.6637, 'learning_rate': 2.2172307692307692e-05, 'epoch': 1.67}


 56%|█████▌    | 67860/121875 [21:15:41<16:25:56,  1.10s/it]

{'loss': 0.7294, 'learning_rate': 2.216e-05, 'epoch': 1.67}


 56%|█████▌    | 67890/121875 [21:16:14<16:29:29,  1.10s/it]

{'loss': 0.7221, 'learning_rate': 2.2147692307692307e-05, 'epoch': 1.67}


 56%|█████▌    | 67920/121875 [21:16:47<16:24:43,  1.10s/it]

{'loss': 0.6988, 'learning_rate': 2.2135384615384616e-05, 'epoch': 1.67}


 56%|█████▌    | 67950/121875 [21:17:20<16:27:37,  1.10s/it]

{'loss': 0.6822, 'learning_rate': 2.2123076923076925e-05, 'epoch': 1.67}


 56%|█████▌    | 67980/121875 [21:17:53<16:24:20,  1.10s/it]

{'loss': 0.7719, 'learning_rate': 2.211076923076923e-05, 'epoch': 1.67}


 56%|█████▌    | 68010/121875 [21:18:28<16:53:10,  1.13s/it]

{'loss': 0.7824, 'learning_rate': 2.209846153846154e-05, 'epoch': 1.67}


 56%|█████▌    | 68040/121875 [21:19:01<16:20:11,  1.09s/it]

{'loss': 0.7368, 'learning_rate': 2.208615384615385e-05, 'epoch': 1.67}


 56%|█████▌    | 68070/121875 [21:19:34<16:24:13,  1.10s/it]

{'loss': 0.7461, 'learning_rate': 2.2073846153846155e-05, 'epoch': 1.68}


 56%|█████▌    | 68100/121875 [21:20:07<16:19:42,  1.09s/it]

{'loss': 0.6438, 'learning_rate': 2.206153846153846e-05, 'epoch': 1.68}


 56%|█████▌    | 68130/121875 [21:20:40<16:19:04,  1.09s/it]

{'loss': 0.697, 'learning_rate': 2.204923076923077e-05, 'epoch': 1.68}


 56%|█████▌    | 68160/121875 [21:21:13<16:23:12,  1.10s/it]

{'loss': 0.7251, 'learning_rate': 2.203692307692308e-05, 'epoch': 1.68}


 56%|█████▌    | 68190/121875 [21:21:46<16:23:09,  1.10s/it]

{'loss': 0.7014, 'learning_rate': 2.2024615384615385e-05, 'epoch': 1.68}


 56%|█████▌    | 68220/121875 [21:22:19<16:19:24,  1.10s/it]

{'loss': 0.6081, 'learning_rate': 2.2012307692307694e-05, 'epoch': 1.68}


 56%|█████▌    | 68250/121875 [21:22:52<16:27:17,  1.10s/it]

{'loss': 0.6808, 'learning_rate': 2.2000000000000003e-05, 'epoch': 1.68}


 56%|█████▌    | 68280/121875 [21:23:25<16:22:24,  1.10s/it]

{'loss': 0.6773, 'learning_rate': 2.198769230769231e-05, 'epoch': 1.68}


 56%|█████▌    | 68310/121875 [21:23:58<16:18:41,  1.10s/it]

{'loss': 0.7295, 'learning_rate': 2.1975384615384618e-05, 'epoch': 1.68}


 56%|█████▌    | 68340/121875 [21:24:31<16:24:52,  1.10s/it]

{'loss': 0.7152, 'learning_rate': 2.1963076923076927e-05, 'epoch': 1.68}


 56%|█████▌    | 68370/121875 [21:25:04<16:23:56,  1.10s/it]

{'loss': 0.6954, 'learning_rate': 2.1950769230769232e-05, 'epoch': 1.68}


 56%|█████▌    | 68400/121875 [21:25:37<16:19:50,  1.10s/it]

{'loss': 0.6874, 'learning_rate': 2.1938461538461538e-05, 'epoch': 1.68}


 56%|█████▌    | 68430/121875 [21:26:10<16:16:53,  1.10s/it]

{'loss': 0.7317, 'learning_rate': 2.1926153846153847e-05, 'epoch': 1.68}


 56%|█████▌    | 68460/121875 [21:26:43<16:20:12,  1.10s/it]

{'loss': 0.8113, 'learning_rate': 2.1913846153846153e-05, 'epoch': 1.69}


 56%|█████▌    | 68490/121875 [21:27:16<16:21:41,  1.10s/it]

{'loss': 0.6564, 'learning_rate': 2.1901538461538462e-05, 'epoch': 1.69}


 56%|█████▌    | 68520/121875 [21:27:51<16:14:36,  1.10s/it]

{'loss': 0.6942, 'learning_rate': 2.188923076923077e-05, 'epoch': 1.69}


 56%|█████▌    | 68550/121875 [21:28:24<16:21:51,  1.10s/it]

{'loss': 0.72, 'learning_rate': 2.1876923076923077e-05, 'epoch': 1.69}


 56%|█████▋    | 68580/121875 [21:28:57<16:13:38,  1.10s/it]

{'loss': 0.6915, 'learning_rate': 2.1864615384615386e-05, 'epoch': 1.69}


 56%|█████▋    | 68610/121875 [21:29:30<16:13:34,  1.10s/it]

{'loss': 0.6936, 'learning_rate': 2.1852307692307695e-05, 'epoch': 1.69}


 56%|█████▋    | 68640/121875 [21:30:03<16:16:35,  1.10s/it]

{'loss': 0.6624, 'learning_rate': 2.184e-05, 'epoch': 1.69}


 56%|█████▋    | 68670/121875 [21:30:36<16:13:33,  1.10s/it]

{'loss': 0.7147, 'learning_rate': 2.182769230769231e-05, 'epoch': 1.69}


 56%|█████▋    | 68700/121875 [21:31:09<16:15:15,  1.10s/it]

{'loss': 0.7072, 'learning_rate': 2.1815384615384616e-05, 'epoch': 1.69}


 56%|█████▋    | 68730/121875 [21:31:42<16:09:50,  1.09s/it]

{'loss': 0.7203, 'learning_rate': 2.1803076923076925e-05, 'epoch': 1.69}


 56%|█████▋    | 68760/121875 [21:32:15<16:04:44,  1.09s/it]

{'loss': 0.7073, 'learning_rate': 2.179076923076923e-05, 'epoch': 1.69}


 56%|█████▋    | 68790/121875 [21:32:48<16:14:57,  1.10s/it]

{'loss': 0.7075, 'learning_rate': 2.177846153846154e-05, 'epoch': 1.69}


 56%|█████▋    | 68820/121875 [21:33:21<16:09:16,  1.10s/it]

{'loss': 0.6964, 'learning_rate': 2.176615384615385e-05, 'epoch': 1.69}


 56%|█████▋    | 68850/121875 [21:33:54<16:10:25,  1.10s/it]

{'loss': 0.7181, 'learning_rate': 2.1753846153846155e-05, 'epoch': 1.69}


 57%|█████▋    | 68880/121875 [21:34:27<16:07:09,  1.10s/it]

{'loss': 0.707, 'learning_rate': 2.1741538461538464e-05, 'epoch': 1.7}


 57%|█████▋    | 68910/121875 [21:35:00<16:06:40,  1.10s/it]

{'loss': 0.7194, 'learning_rate': 2.1729230769230773e-05, 'epoch': 1.7}


 57%|█████▋    | 68940/121875 [21:35:33<16:12:11,  1.10s/it]

{'loss': 0.6681, 'learning_rate': 2.171692307692308e-05, 'epoch': 1.7}


 57%|█████▋    | 68970/121875 [21:36:06<16:07:16,  1.10s/it]

{'loss': 0.6949, 'learning_rate': 2.1704615384615384e-05, 'epoch': 1.7}


 57%|█████▋    | 69000/121875 [21:36:39<16:06:17,  1.10s/it]

{'loss': 0.7764, 'learning_rate': 2.1692307692307693e-05, 'epoch': 1.7}


 57%|█████▋    | 69030/121875 [21:37:14<16:07:50,  1.10s/it]

{'loss': 0.7123, 'learning_rate': 2.168e-05, 'epoch': 1.7}


 57%|█████▋    | 69060/121875 [21:37:47<16:07:00,  1.10s/it]

{'loss': 0.6673, 'learning_rate': 2.1667692307692308e-05, 'epoch': 1.7}


 57%|█████▋    | 69090/121875 [21:38:20<16:08:13,  1.10s/it]

{'loss': 0.6678, 'learning_rate': 2.1655384615384617e-05, 'epoch': 1.7}


 57%|█████▋    | 69120/121875 [21:38:53<16:04:54,  1.10s/it]

{'loss': 0.7126, 'learning_rate': 2.1643076923076923e-05, 'epoch': 1.7}


 57%|█████▋    | 69150/121875 [21:39:26<16:06:11,  1.10s/it]

{'loss': 0.6945, 'learning_rate': 2.1630769230769232e-05, 'epoch': 1.7}


 57%|█████▋    | 69180/121875 [21:39:59<16:08:45,  1.10s/it]

{'loss': 0.6835, 'learning_rate': 2.161846153846154e-05, 'epoch': 1.7}


 57%|█████▋    | 69210/121875 [21:40:32<16:00:01,  1.09s/it]

{'loss': 0.6952, 'learning_rate': 2.1606153846153847e-05, 'epoch': 1.7}


 57%|█████▋    | 69240/121875 [21:41:05<16:08:00,  1.10s/it]

{'loss': 0.6964, 'learning_rate': 2.1593846153846156e-05, 'epoch': 1.7}


 57%|█████▋    | 69270/121875 [21:41:39<16:04:52,  1.10s/it]

{'loss': 0.6959, 'learning_rate': 2.1581538461538462e-05, 'epoch': 1.71}


 57%|█████▋    | 69300/121875 [21:42:11<16:02:47,  1.10s/it]

{'loss': 0.722, 'learning_rate': 2.156923076923077e-05, 'epoch': 1.71}


 57%|█████▋    | 69330/121875 [21:42:44<16:00:20,  1.10s/it]

{'loss': 0.646, 'learning_rate': 2.1556923076923077e-05, 'epoch': 1.71}


 57%|█████▋    | 69360/121875 [21:43:17<16:05:49,  1.10s/it]

{'loss': 0.7202, 'learning_rate': 2.1544615384615386e-05, 'epoch': 1.71}


 57%|█████▋    | 69390/121875 [21:43:50<16:07:33,  1.11s/it]

{'loss': 0.7125, 'learning_rate': 2.1532307692307695e-05, 'epoch': 1.71}


 57%|█████▋    | 69420/121875 [21:44:23<16:02:25,  1.10s/it]

{'loss': 0.7437, 'learning_rate': 2.152e-05, 'epoch': 1.71}


 57%|█████▋    | 69450/121875 [21:44:56<16:03:16,  1.10s/it]

{'loss': 0.6579, 'learning_rate': 2.150769230769231e-05, 'epoch': 1.71}


 57%|█████▋    | 69480/121875 [21:45:29<15:57:53,  1.10s/it]

{'loss': 0.7385, 'learning_rate': 2.149538461538462e-05, 'epoch': 1.71}


 57%|█████▋    | 69510/121875 [21:46:05<16:21:38,  1.12s/it]

{'loss': 0.7124, 'learning_rate': 2.1483076923076925e-05, 'epoch': 1.71}


 57%|█████▋    | 69540/121875 [21:46:38<15:58:56,  1.10s/it]

{'loss': 0.7054, 'learning_rate': 2.147076923076923e-05, 'epoch': 1.71}


 57%|█████▋    | 69570/121875 [21:47:11<16:03:09,  1.10s/it]

{'loss': 0.6898, 'learning_rate': 2.145846153846154e-05, 'epoch': 1.71}


 57%|█████▋    | 69600/121875 [21:47:44<15:58:45,  1.10s/it]

{'loss': 0.6848, 'learning_rate': 2.1446153846153845e-05, 'epoch': 1.71}


 57%|█████▋    | 69630/121875 [21:48:17<15:57:26,  1.10s/it]

{'loss': 0.6417, 'learning_rate': 2.1433846153846154e-05, 'epoch': 1.71}


 57%|█████▋    | 69660/121875 [21:48:50<15:55:26,  1.10s/it]

{'loss': 0.7579, 'learning_rate': 2.1421538461538463e-05, 'epoch': 1.71}


 57%|█████▋    | 69690/121875 [21:49:23<15:56:43,  1.10s/it]

{'loss': 0.7518, 'learning_rate': 2.140923076923077e-05, 'epoch': 1.72}


 57%|█████▋    | 69720/121875 [21:49:56<15:56:26,  1.10s/it]

{'loss': 0.6516, 'learning_rate': 2.1396923076923078e-05, 'epoch': 1.72}


 57%|█████▋    | 69750/121875 [21:50:29<15:53:49,  1.10s/it]

{'loss': 0.7006, 'learning_rate': 2.1384615384615387e-05, 'epoch': 1.72}


 57%|█████▋    | 69780/121875 [21:51:02<15:55:35,  1.10s/it]

{'loss': 0.7295, 'learning_rate': 2.1372307692307693e-05, 'epoch': 1.72}


 57%|█████▋    | 69810/121875 [21:51:35<15:45:32,  1.09s/it]

{'loss': 0.7527, 'learning_rate': 2.1360000000000002e-05, 'epoch': 1.72}


 57%|█████▋    | 69840/121875 [21:52:08<15:54:01,  1.10s/it]

{'loss': 0.7002, 'learning_rate': 2.1347692307692308e-05, 'epoch': 1.72}


 57%|█████▋    | 69870/121875 [21:52:41<15:53:14,  1.10s/it]

{'loss': 0.733, 'learning_rate': 2.1335384615384617e-05, 'epoch': 1.72}


 57%|█████▋    | 69900/121875 [21:53:14<15:52:57,  1.10s/it]

{'loss': 0.7439, 'learning_rate': 2.1323076923076923e-05, 'epoch': 1.72}


 57%|█████▋    | 69930/121875 [21:53:47<15:53:13,  1.10s/it]

{'loss': 0.6932, 'learning_rate': 2.1310769230769232e-05, 'epoch': 1.72}


 57%|█████▋    | 69960/121875 [21:54:20<15:50:11,  1.10s/it]

{'loss': 0.6702, 'learning_rate': 2.129846153846154e-05, 'epoch': 1.72}


 57%|█████▋    | 69990/121875 [21:54:53<15:50:53,  1.10s/it]

{'loss': 0.7081, 'learning_rate': 2.1286153846153847e-05, 'epoch': 1.72}


 57%|█████▋    | 70020/121875 [21:55:28<15:52:53,  1.10s/it]

{'loss': 0.674, 'learning_rate': 2.1273846153846156e-05, 'epoch': 1.72}


 57%|█████▋    | 70050/121875 [21:56:01<15:50:58,  1.10s/it]

{'loss': 0.6841, 'learning_rate': 2.1261538461538465e-05, 'epoch': 1.72}


 58%|█████▊    | 70080/121875 [21:56:34<15:47:24,  1.10s/it]

{'loss': 0.7517, 'learning_rate': 2.124923076923077e-05, 'epoch': 1.73}


 58%|█████▊    | 70110/121875 [21:57:07<15:50:27,  1.10s/it]

{'loss': 0.6769, 'learning_rate': 2.123692307692308e-05, 'epoch': 1.73}


 58%|█████▊    | 70140/121875 [21:57:40<15:51:27,  1.10s/it]

{'loss': 0.707, 'learning_rate': 2.1224615384615385e-05, 'epoch': 1.73}


 58%|█████▊    | 70170/121875 [21:58:13<15:48:28,  1.10s/it]

{'loss': 0.721, 'learning_rate': 2.121230769230769e-05, 'epoch': 1.73}


 58%|█████▊    | 70200/121875 [21:58:46<15:48:13,  1.10s/it]

{'loss': 0.6667, 'learning_rate': 2.12e-05, 'epoch': 1.73}


 58%|█████▊    | 70230/121875 [21:59:19<15:46:46,  1.10s/it]

{'loss': 0.6316, 'learning_rate': 2.118769230769231e-05, 'epoch': 1.73}


 58%|█████▊    | 70260/121875 [21:59:52<15:45:39,  1.10s/it]

{'loss': 0.7021, 'learning_rate': 2.1175384615384615e-05, 'epoch': 1.73}


 58%|█████▊    | 70290/121875 [22:00:25<15:48:44,  1.10s/it]

{'loss': 0.7084, 'learning_rate': 2.1163076923076924e-05, 'epoch': 1.73}


 58%|█████▊    | 70320/121875 [22:00:58<15:36:22,  1.09s/it]

{'loss': 0.6791, 'learning_rate': 2.1150769230769233e-05, 'epoch': 1.73}


 58%|█████▊    | 70350/121875 [22:01:31<15:42:51,  1.10s/it]

{'loss': 0.6969, 'learning_rate': 2.113846153846154e-05, 'epoch': 1.73}


 58%|█████▊    | 70380/121875 [22:02:04<15:42:50,  1.10s/it]

{'loss': 0.6546, 'learning_rate': 2.1126153846153848e-05, 'epoch': 1.73}


 58%|█████▊    | 70410/121875 [22:02:37<15:46:02,  1.10s/it]

{'loss': 0.6834, 'learning_rate': 2.1113846153846154e-05, 'epoch': 1.73}


 58%|█████▊    | 70440/121875 [22:03:10<15:40:15,  1.10s/it]

{'loss': 0.7517, 'learning_rate': 2.1101538461538463e-05, 'epoch': 1.73}


 58%|█████▊    | 70470/121875 [22:03:43<15:43:51,  1.10s/it]

{'loss': 0.6598, 'learning_rate': 2.108923076923077e-05, 'epoch': 1.73}


 58%|█████▊    | 70500/121875 [22:04:16<15:39:15,  1.10s/it]

{'loss': 0.6424, 'learning_rate': 2.1076923076923078e-05, 'epoch': 1.74}


 58%|█████▊    | 70530/121875 [22:04:51<15:39:34,  1.10s/it]

{'loss': 0.6218, 'learning_rate': 2.1064615384615387e-05, 'epoch': 1.74}


 58%|█████▊    | 70560/121875 [22:05:24<15:40:12,  1.10s/it]

{'loss': 0.6801, 'learning_rate': 2.1052307692307693e-05, 'epoch': 1.74}


 58%|█████▊    | 70590/121875 [22:05:57<15:41:56,  1.10s/it]

{'loss': 0.6552, 'learning_rate': 2.1040000000000002e-05, 'epoch': 1.74}


 58%|█████▊    | 70620/121875 [22:06:30<15:37:42,  1.10s/it]

{'loss': 0.7159, 'learning_rate': 2.102769230769231e-05, 'epoch': 1.74}


 58%|█████▊    | 70650/121875 [22:07:03<15:38:51,  1.10s/it]

{'loss': 0.7314, 'learning_rate': 2.1015384615384617e-05, 'epoch': 1.74}


 58%|█████▊    | 70680/121875 [22:07:36<15:39:55,  1.10s/it]

{'loss': 0.6339, 'learning_rate': 2.1003076923076926e-05, 'epoch': 1.74}


 58%|█████▊    | 70710/121875 [22:08:09<15:41:39,  1.10s/it]

{'loss': 0.7454, 'learning_rate': 2.099076923076923e-05, 'epoch': 1.74}


 58%|█████▊    | 70740/121875 [22:08:42<15:37:39,  1.10s/it]

{'loss': 0.7152, 'learning_rate': 2.0978461538461537e-05, 'epoch': 1.74}


 58%|█████▊    | 70770/121875 [22:09:15<15:37:03,  1.10s/it]

{'loss': 0.8065, 'learning_rate': 2.0966153846153846e-05, 'epoch': 1.74}


 58%|█████▊    | 70800/121875 [22:09:48<15:35:20,  1.10s/it]

{'loss': 0.7167, 'learning_rate': 2.0953846153846155e-05, 'epoch': 1.74}


 58%|█████▊    | 70830/121875 [22:10:21<15:39:24,  1.10s/it]

{'loss': 0.6657, 'learning_rate': 2.094153846153846e-05, 'epoch': 1.74}


 58%|█████▊    | 70860/121875 [22:10:54<15:30:34,  1.09s/it]

{'loss': 0.6842, 'learning_rate': 2.092923076923077e-05, 'epoch': 1.74}


 58%|█████▊    | 70890/121875 [22:11:27<15:33:29,  1.10s/it]

{'loss': 0.6711, 'learning_rate': 2.091692307692308e-05, 'epoch': 1.74}


 58%|█████▊    | 70920/121875 [22:12:00<15:35:12,  1.10s/it]

{'loss': 0.7116, 'learning_rate': 2.0904615384615385e-05, 'epoch': 1.75}


 58%|█████▊    | 70950/121875 [22:12:33<15:33:15,  1.10s/it]

{'loss': 0.7308, 'learning_rate': 2.0892307692307694e-05, 'epoch': 1.75}


 58%|█████▊    | 70980/121875 [22:13:06<15:29:39,  1.10s/it]

{'loss': 0.7006, 'learning_rate': 2.0880000000000003e-05, 'epoch': 1.75}


 58%|█████▊    | 71010/121875 [22:13:41<15:57:26,  1.13s/it]

{'loss': 0.6835, 'learning_rate': 2.086769230769231e-05, 'epoch': 1.75}


 58%|█████▊    | 71040/121875 [22:14:14<15:33:54,  1.10s/it]

{'loss': 0.673, 'learning_rate': 2.0855384615384615e-05, 'epoch': 1.75}


 58%|█████▊    | 71070/121875 [22:14:47<15:27:28,  1.10s/it]

{'loss': 0.754, 'learning_rate': 2.0843076923076924e-05, 'epoch': 1.75}


 58%|█████▊    | 71100/121875 [22:15:20<15:31:14,  1.10s/it]

{'loss': 0.7364, 'learning_rate': 2.0830769230769233e-05, 'epoch': 1.75}


 58%|█████▊    | 71130/121875 [22:15:53<15:33:54,  1.10s/it]

{'loss': 0.6689, 'learning_rate': 2.081846153846154e-05, 'epoch': 1.75}


 58%|█████▊    | 71160/121875 [22:16:26<15:26:39,  1.10s/it]

{'loss': 0.656, 'learning_rate': 2.0806153846153848e-05, 'epoch': 1.75}


 58%|█████▊    | 71190/121875 [22:16:59<15:30:33,  1.10s/it]

{'loss': 0.6773, 'learning_rate': 2.0793846153846157e-05, 'epoch': 1.75}


 58%|█████▊    | 71220/121875 [22:17:32<15:28:59,  1.10s/it]

{'loss': 0.6749, 'learning_rate': 2.0781538461538463e-05, 'epoch': 1.75}


 58%|█████▊    | 71250/121875 [22:18:05<15:25:40,  1.10s/it]

{'loss': 0.6878, 'learning_rate': 2.0769230769230772e-05, 'epoch': 1.75}


 58%|█████▊    | 71280/121875 [22:18:38<15:28:43,  1.10s/it]

{'loss': 0.7314, 'learning_rate': 2.0756923076923078e-05, 'epoch': 1.75}


 59%|█████▊    | 71310/121875 [22:19:11<15:24:51,  1.10s/it]

{'loss': 0.6765, 'learning_rate': 2.0744615384615383e-05, 'epoch': 1.76}


 59%|█████▊    | 71340/121875 [22:19:44<15:23:23,  1.10s/it]

{'loss': 0.6713, 'learning_rate': 2.0732307692307692e-05, 'epoch': 1.76}


 59%|█████▊    | 71370/121875 [22:20:17<15:25:51,  1.10s/it]

{'loss': 0.6541, 'learning_rate': 2.072e-05, 'epoch': 1.76}


 59%|█████▊    | 71400/121875 [22:20:50<15:22:07,  1.10s/it]

{'loss': 0.6733, 'learning_rate': 2.0707692307692307e-05, 'epoch': 1.76}


 59%|█████▊    | 71430/121875 [22:21:23<15:22:34,  1.10s/it]

{'loss': 0.7008, 'learning_rate': 2.0695384615384616e-05, 'epoch': 1.76}


 59%|█████▊    | 71460/121875 [22:21:56<15:20:44,  1.10s/it]

{'loss': 0.7104, 'learning_rate': 2.0683076923076925e-05, 'epoch': 1.76}


 59%|█████▊    | 71490/121875 [22:22:29<15:22:53,  1.10s/it]

{'loss': 0.6942, 'learning_rate': 2.067076923076923e-05, 'epoch': 1.76}


 59%|█████▊    | 71520/121875 [22:23:04<15:23:04,  1.10s/it]

{'loss': 0.6694, 'learning_rate': 2.065846153846154e-05, 'epoch': 1.76}


 59%|█████▊    | 71550/121875 [22:23:37<15:21:15,  1.10s/it]

{'loss': 0.6812, 'learning_rate': 2.064615384615385e-05, 'epoch': 1.76}


 59%|█████▊    | 71580/121875 [22:24:10<15:21:53,  1.10s/it]

{'loss': 0.7248, 'learning_rate': 2.0633846153846155e-05, 'epoch': 1.76}


 59%|█████▉    | 71610/121875 [22:24:43<15:21:47,  1.10s/it]

{'loss': 0.6926, 'learning_rate': 2.062153846153846e-05, 'epoch': 1.76}


 59%|█████▉    | 71640/121875 [22:25:16<15:22:56,  1.10s/it]

{'loss': 0.7756, 'learning_rate': 2.060923076923077e-05, 'epoch': 1.76}


 59%|█████▉    | 71670/121875 [22:25:49<15:15:35,  1.09s/it]

{'loss': 0.7444, 'learning_rate': 2.059692307692308e-05, 'epoch': 1.76}


 59%|█████▉    | 71700/121875 [22:26:22<15:17:15,  1.10s/it]

{'loss': 0.6884, 'learning_rate': 2.0584615384615385e-05, 'epoch': 1.76}


 59%|█████▉    | 71730/121875 [22:26:55<15:16:36,  1.10s/it]

{'loss': 0.7118, 'learning_rate': 2.0572307692307694e-05, 'epoch': 1.77}


 59%|█████▉    | 71760/121875 [22:27:28<15:18:31,  1.10s/it]

{'loss': 0.7032, 'learning_rate': 2.0560000000000003e-05, 'epoch': 1.77}


 59%|█████▉    | 71790/121875 [22:28:01<15:16:49,  1.10s/it]

{'loss': 0.6519, 'learning_rate': 2.054769230769231e-05, 'epoch': 1.77}


 59%|█████▉    | 71820/121875 [22:28:34<15:17:36,  1.10s/it]

{'loss': 0.6888, 'learning_rate': 2.0535384615384618e-05, 'epoch': 1.77}


 59%|█████▉    | 71850/121875 [22:29:07<15:14:33,  1.10s/it]

{'loss': 0.6922, 'learning_rate': 2.0523076923076927e-05, 'epoch': 1.77}


 59%|█████▉    | 71880/121875 [22:29:40<15:19:41,  1.10s/it]

{'loss': 0.7181, 'learning_rate': 2.051076923076923e-05, 'epoch': 1.77}


 59%|█████▉    | 71910/121875 [22:30:13<15:19:22,  1.10s/it]

{'loss': 0.6896, 'learning_rate': 2.049846153846154e-05, 'epoch': 1.77}


 59%|█████▉    | 71940/121875 [22:30:46<15:09:30,  1.09s/it]

{'loss': 0.7138, 'learning_rate': 2.0486153846153848e-05, 'epoch': 1.77}


 59%|█████▉    | 71970/121875 [22:31:19<15:14:38,  1.10s/it]

{'loss': 0.6454, 'learning_rate': 2.0473846153846153e-05, 'epoch': 1.77}


 59%|█████▉    | 72000/121875 [22:31:52<15:13:59,  1.10s/it]

{'loss': 0.6507, 'learning_rate': 2.0461538461538462e-05, 'epoch': 1.77}


 59%|█████▉    | 72030/121875 [22:32:27<15:13:55,  1.10s/it]

{'loss': 0.7399, 'learning_rate': 2.044923076923077e-05, 'epoch': 1.77}


 59%|█████▉    | 72060/121875 [22:33:00<15:15:26,  1.10s/it]

{'loss': 0.7197, 'learning_rate': 2.0436923076923077e-05, 'epoch': 1.77}


 59%|█████▉    | 72090/121875 [22:33:33<15:13:53,  1.10s/it]

{'loss': 0.6949, 'learning_rate': 2.0424615384615386e-05, 'epoch': 1.77}


 59%|█████▉    | 72120/121875 [22:34:06<15:14:30,  1.10s/it]

{'loss': 0.718, 'learning_rate': 2.0412307692307695e-05, 'epoch': 1.78}


 59%|█████▉    | 72150/121875 [22:34:39<15:12:27,  1.10s/it]

{'loss': 0.6932, 'learning_rate': 2.04e-05, 'epoch': 1.78}


 59%|█████▉    | 72180/121875 [22:35:12<15:10:58,  1.10s/it]

{'loss': 0.7406, 'learning_rate': 2.0387692307692307e-05, 'epoch': 1.78}


 59%|█████▉    | 72210/121875 [22:35:45<14:57:53,  1.08s/it]

{'loss': 0.7075, 'learning_rate': 2.0375384615384616e-05, 'epoch': 1.78}


 59%|█████▉    | 72240/121875 [22:36:17<14:56:01,  1.08s/it]

{'loss': 0.733, 'learning_rate': 2.0363076923076925e-05, 'epoch': 1.78}


 59%|█████▉    | 72270/121875 [22:36:50<15:13:50,  1.11s/it]

{'loss': 0.7078, 'learning_rate': 2.035076923076923e-05, 'epoch': 1.78}


 59%|█████▉    | 72300/121875 [22:37:23<15:10:27,  1.10s/it]

{'loss': 0.6992, 'learning_rate': 2.033846153846154e-05, 'epoch': 1.78}


 59%|█████▉    | 72330/121875 [22:37:56<15:07:11,  1.10s/it]

{'loss': 0.6479, 'learning_rate': 2.032615384615385e-05, 'epoch': 1.78}


 59%|█████▉    | 72360/121875 [22:38:29<15:06:52,  1.10s/it]

{'loss': 0.6935, 'learning_rate': 2.0313846153846155e-05, 'epoch': 1.78}


 59%|█████▉    | 72390/121875 [22:39:02<15:06:56,  1.10s/it]

{'loss': 0.6878, 'learning_rate': 2.0301538461538464e-05, 'epoch': 1.78}


 59%|█████▉    | 72420/121875 [22:39:35<15:05:01,  1.10s/it]

{'loss': 0.6991, 'learning_rate': 2.0289230769230773e-05, 'epoch': 1.78}


 59%|█████▉    | 72450/121875 [22:40:08<15:04:11,  1.10s/it]

{'loss': 0.7192, 'learning_rate': 2.0276923076923075e-05, 'epoch': 1.78}


 59%|█████▉    | 72480/121875 [22:40:41<15:04:51,  1.10s/it]

{'loss': 0.7382, 'learning_rate': 2.0264615384615384e-05, 'epoch': 1.78}


 59%|█████▉    | 72510/121875 [22:41:16<15:27:03,  1.13s/it]

{'loss': 0.6726, 'learning_rate': 2.0252307692307694e-05, 'epoch': 1.78}


 60%|█████▉    | 72540/121875 [22:41:49<15:10:25,  1.11s/it]

{'loss': 0.6841, 'learning_rate': 2.024e-05, 'epoch': 1.79}


 60%|█████▉    | 72570/121875 [22:42:22<15:05:13,  1.10s/it]

{'loss': 0.6484, 'learning_rate': 2.022769230769231e-05, 'epoch': 1.79}


 60%|█████▉    | 72600/121875 [22:42:55<15:04:18,  1.10s/it]

{'loss': 0.7107, 'learning_rate': 2.0215384615384618e-05, 'epoch': 1.79}


 60%|█████▉    | 72630/121875 [22:43:28<15:04:17,  1.10s/it]

{'loss': 0.7312, 'learning_rate': 2.0203076923076923e-05, 'epoch': 1.79}


 60%|█████▉    | 72660/121875 [22:44:01<15:06:26,  1.11s/it]

{'loss': 0.6943, 'learning_rate': 2.0190769230769232e-05, 'epoch': 1.79}


 60%|█████▉    | 72690/121875 [22:44:34<15:04:40,  1.10s/it]

{'loss': 0.6624, 'learning_rate': 2.017846153846154e-05, 'epoch': 1.79}


 60%|█████▉    | 72720/121875 [22:45:07<14:58:10,  1.10s/it]

{'loss': 0.6939, 'learning_rate': 2.0166153846153847e-05, 'epoch': 1.79}


 60%|█████▉    | 72750/121875 [22:45:40<15:01:28,  1.10s/it]

{'loss': 0.6797, 'learning_rate': 2.0153846153846153e-05, 'epoch': 1.79}


 60%|█████▉    | 72780/121875 [22:46:13<15:00:40,  1.10s/it]

{'loss': 0.7048, 'learning_rate': 2.0141538461538462e-05, 'epoch': 1.79}


 60%|█████▉    | 72810/121875 [22:46:46<14:58:44,  1.10s/it]

{'loss': 0.721, 'learning_rate': 2.012923076923077e-05, 'epoch': 1.79}


 60%|█████▉    | 72840/121875 [22:47:19<15:02:22,  1.10s/it]

{'loss': 0.6918, 'learning_rate': 2.0116923076923077e-05, 'epoch': 1.79}


 60%|█████▉    | 72870/121875 [22:47:52<14:55:47,  1.10s/it]

{'loss': 0.7221, 'learning_rate': 2.0104615384615386e-05, 'epoch': 1.79}


 60%|█████▉    | 72900/121875 [22:48:25<14:57:46,  1.10s/it]

{'loss': 0.6664, 'learning_rate': 2.0092307692307695e-05, 'epoch': 1.79}


 60%|█████▉    | 72930/121875 [22:48:58<14:59:43,  1.10s/it]

{'loss': 0.7101, 'learning_rate': 2.008e-05, 'epoch': 1.8}


 60%|█████▉    | 72960/121875 [22:49:31<14:59:51,  1.10s/it]

{'loss': 0.7297, 'learning_rate': 2.006769230769231e-05, 'epoch': 1.8}


 60%|█████▉    | 72990/121875 [22:50:04<14:48:33,  1.09s/it]

{'loss': 0.7127, 'learning_rate': 2.005538461538462e-05, 'epoch': 1.8}


 60%|█████▉    | 73020/121875 [22:50:39<14:53:36,  1.10s/it]

{'loss': 0.6692, 'learning_rate': 2.004307692307692e-05, 'epoch': 1.8}


 60%|█████▉    | 73050/121875 [22:51:12<14:57:32,  1.10s/it]

{'loss': 0.7034, 'learning_rate': 2.003076923076923e-05, 'epoch': 1.8}


 60%|█████▉    | 73080/121875 [22:51:45<14:54:11,  1.10s/it]

{'loss': 0.6959, 'learning_rate': 2.001846153846154e-05, 'epoch': 1.8}


 60%|█████▉    | 73110/121875 [22:52:18<14:55:31,  1.10s/it]

{'loss': 0.7188, 'learning_rate': 2.0006153846153845e-05, 'epoch': 1.8}


 60%|██████    | 73140/121875 [22:52:51<14:52:54,  1.10s/it]

{'loss': 0.6651, 'learning_rate': 1.9993846153846154e-05, 'epoch': 1.8}


 60%|██████    | 73170/121875 [22:53:24<14:46:52,  1.09s/it]

{'loss': 0.6921, 'learning_rate': 1.9981538461538464e-05, 'epoch': 1.8}


 60%|██████    | 73200/121875 [22:53:57<14:51:40,  1.10s/it]

{'loss': 0.7079, 'learning_rate': 1.996923076923077e-05, 'epoch': 1.8}


 60%|██████    | 73230/121875 [22:54:30<14:51:49,  1.10s/it]

{'loss': 0.7004, 'learning_rate': 1.995692307692308e-05, 'epoch': 1.8}


 60%|██████    | 73260/121875 [22:55:03<14:55:57,  1.11s/it]

{'loss': 0.7019, 'learning_rate': 1.9944615384615388e-05, 'epoch': 1.8}


 60%|██████    | 73290/121875 [22:55:36<14:53:31,  1.10s/it]

{'loss': 0.7729, 'learning_rate': 1.9932307692307693e-05, 'epoch': 1.8}


 60%|██████    | 73320/121875 [22:56:09<14:47:01,  1.10s/it]

{'loss': 0.6757, 'learning_rate': 1.992e-05, 'epoch': 1.8}


 60%|██████    | 73350/121875 [22:56:42<14:53:41,  1.11s/it]

{'loss': 0.6634, 'learning_rate': 1.9907692307692308e-05, 'epoch': 1.81}


 60%|██████    | 73380/121875 [22:57:15<14:48:00,  1.10s/it]

{'loss': 0.742, 'learning_rate': 1.9895384615384617e-05, 'epoch': 1.81}


 60%|██████    | 73410/121875 [22:57:48<14:43:02,  1.09s/it]

{'loss': 0.7193, 'learning_rate': 1.9883076923076923e-05, 'epoch': 1.81}


 60%|██████    | 73440/121875 [22:58:21<14:49:37,  1.10s/it]

{'loss': 0.7623, 'learning_rate': 1.9870769230769232e-05, 'epoch': 1.81}


 60%|██████    | 73470/121875 [22:58:54<14:47:02,  1.10s/it]

{'loss': 0.6646, 'learning_rate': 1.985846153846154e-05, 'epoch': 1.81}


 60%|██████    | 73500/121875 [22:59:27<14:44:15,  1.10s/it]

{'loss': 0.6172, 'learning_rate': 1.9846153846153847e-05, 'epoch': 1.81}


 60%|██████    | 73530/121875 [23:00:03<14:46:07,  1.10s/it]

{'loss': 0.7666, 'learning_rate': 1.9833846153846156e-05, 'epoch': 1.81}


 60%|██████    | 73560/121875 [23:00:36<14:38:13,  1.09s/it]

{'loss': 0.7339, 'learning_rate': 1.9821538461538465e-05, 'epoch': 1.81}


 60%|██████    | 73590/121875 [23:01:09<14:45:20,  1.10s/it]

{'loss': 0.68, 'learning_rate': 1.9809230769230767e-05, 'epoch': 1.81}


 60%|██████    | 73620/121875 [23:01:42<14:41:57,  1.10s/it]

{'loss': 0.7068, 'learning_rate': 1.9796923076923077e-05, 'epoch': 1.81}


 60%|██████    | 73650/121875 [23:02:15<14:44:58,  1.10s/it]

{'loss': 0.6655, 'learning_rate': 1.9784615384615386e-05, 'epoch': 1.81}


 60%|██████    | 73680/121875 [23:02:48<14:42:39,  1.10s/it]

{'loss': 0.6071, 'learning_rate': 1.977230769230769e-05, 'epoch': 1.81}


 60%|██████    | 73710/121875 [23:03:21<14:40:11,  1.10s/it]

{'loss': 0.7036, 'learning_rate': 1.976e-05, 'epoch': 1.81}


 61%|██████    | 73740/121875 [23:03:54<14:46:57,  1.11s/it]

{'loss': 0.7616, 'learning_rate': 1.974769230769231e-05, 'epoch': 1.82}


 61%|██████    | 73770/121875 [23:04:27<14:41:39,  1.10s/it]

{'loss': 0.6399, 'learning_rate': 1.9735384615384615e-05, 'epoch': 1.82}


 61%|██████    | 73800/121875 [23:04:59<14:35:44,  1.09s/it]

{'loss': 0.6779, 'learning_rate': 1.9723076923076924e-05, 'epoch': 1.82}


 61%|██████    | 73830/121875 [23:05:32<14:36:41,  1.09s/it]

{'loss': 0.7177, 'learning_rate': 1.9710769230769234e-05, 'epoch': 1.82}


 61%|██████    | 73860/121875 [23:06:05<14:43:43,  1.10s/it]

{'loss': 0.6454, 'learning_rate': 1.969846153846154e-05, 'epoch': 1.82}


 61%|██████    | 73890/121875 [23:06:38<14:43:32,  1.10s/it]

{'loss': 0.6383, 'learning_rate': 1.9686153846153845e-05, 'epoch': 1.82}


 61%|██████    | 73920/121875 [23:07:11<14:40:05,  1.10s/it]

{'loss': 0.7149, 'learning_rate': 1.9673846153846154e-05, 'epoch': 1.82}


 61%|██████    | 73950/121875 [23:07:44<14:40:05,  1.10s/it]

{'loss': 0.7017, 'learning_rate': 1.9661538461538463e-05, 'epoch': 1.82}


 61%|██████    | 73980/121875 [23:08:17<14:42:03,  1.10s/it]

{'loss': 0.7544, 'learning_rate': 1.964923076923077e-05, 'epoch': 1.82}


 61%|██████    | 74010/121875 [23:08:53<15:01:11,  1.13s/it]

{'loss': 0.6737, 'learning_rate': 1.9636923076923078e-05, 'epoch': 1.82}


 61%|██████    | 74040/121875 [23:09:26<14:34:59,  1.10s/it]

{'loss': 0.6533, 'learning_rate': 1.9624615384615387e-05, 'epoch': 1.82}


 61%|██████    | 74070/121875 [23:09:59<14:34:35,  1.10s/it]

{'loss': 0.7702, 'learning_rate': 1.9612307692307693e-05, 'epoch': 1.82}


 61%|██████    | 74100/121875 [23:10:32<14:32:52,  1.10s/it]

{'loss': 0.7682, 'learning_rate': 1.9600000000000002e-05, 'epoch': 1.82}


 61%|██████    | 74130/121875 [23:11:05<14:37:36,  1.10s/it]

{'loss': 0.6847, 'learning_rate': 1.958769230769231e-05, 'epoch': 1.82}


 61%|██████    | 74160/121875 [23:11:38<14:33:23,  1.10s/it]

{'loss': 0.6323, 'learning_rate': 1.9575384615384617e-05, 'epoch': 1.83}


 61%|██████    | 74190/121875 [23:12:11<14:33:02,  1.10s/it]

{'loss': 0.6641, 'learning_rate': 1.9563076923076923e-05, 'epoch': 1.83}


 61%|██████    | 74220/121875 [23:12:44<14:27:36,  1.09s/it]

{'loss': 0.7356, 'learning_rate': 1.9550769230769232e-05, 'epoch': 1.83}


 61%|██████    | 74250/121875 [23:13:17<14:33:25,  1.10s/it]

{'loss': 0.7247, 'learning_rate': 1.9538461538461537e-05, 'epoch': 1.83}


 61%|██████    | 74280/121875 [23:13:50<14:38:20,  1.11s/it]

{'loss': 0.68, 'learning_rate': 1.9526153846153847e-05, 'epoch': 1.83}


 61%|██████    | 74310/121875 [23:14:23<14:31:27,  1.10s/it]

{'loss': 0.7722, 'learning_rate': 1.9513846153846156e-05, 'epoch': 1.83}


 61%|██████    | 74340/121875 [23:14:56<14:23:38,  1.09s/it]

{'loss': 0.7352, 'learning_rate': 1.950153846153846e-05, 'epoch': 1.83}


 61%|██████    | 74370/121875 [23:15:29<14:27:29,  1.10s/it]

{'loss': 0.6642, 'learning_rate': 1.948923076923077e-05, 'epoch': 1.83}


 61%|██████    | 74400/121875 [23:16:02<14:32:50,  1.10s/it]

{'loss': 0.6828, 'learning_rate': 1.947692307692308e-05, 'epoch': 1.83}


 61%|██████    | 74430/121875 [23:16:35<14:25:36,  1.09s/it]

{'loss': 0.6925, 'learning_rate': 1.9464615384615385e-05, 'epoch': 1.83}


 61%|██████    | 74460/121875 [23:17:08<14:26:26,  1.10s/it]

{'loss': 0.7748, 'learning_rate': 1.945230769230769e-05, 'epoch': 1.83}


 61%|██████    | 74490/121875 [23:17:41<14:29:24,  1.10s/it]

{'loss': 0.7121, 'learning_rate': 1.944e-05, 'epoch': 1.83}


 61%|██████    | 74520/121875 [23:18:16<14:27:33,  1.10s/it]

{'loss': 0.7168, 'learning_rate': 1.942769230769231e-05, 'epoch': 1.83}


 61%|██████    | 74550/121875 [23:18:49<14:32:33,  1.11s/it]

{'loss': 0.6559, 'learning_rate': 1.9415384615384615e-05, 'epoch': 1.84}


 61%|██████    | 74580/121875 [23:19:22<14:30:27,  1.10s/it]

{'loss': 0.7009, 'learning_rate': 1.9403076923076924e-05, 'epoch': 1.84}


 61%|██████    | 74610/121875 [23:19:55<14:26:27,  1.10s/it]

{'loss': 0.6399, 'learning_rate': 1.9390769230769233e-05, 'epoch': 1.84}


 61%|██████    | 74640/121875 [23:20:28<14:24:38,  1.10s/it]

{'loss': 0.6574, 'learning_rate': 1.937846153846154e-05, 'epoch': 1.84}


 61%|██████▏   | 74670/121875 [23:21:01<14:24:54,  1.10s/it]

{'loss': 0.7081, 'learning_rate': 1.9366153846153848e-05, 'epoch': 1.84}


 61%|██████▏   | 74700/121875 [23:21:34<14:20:19,  1.09s/it]

{'loss': 0.6753, 'learning_rate': 1.9353846153846157e-05, 'epoch': 1.84}


 61%|██████▏   | 74730/121875 [23:22:07<14:25:09,  1.10s/it]

{'loss': 0.7055, 'learning_rate': 1.9341538461538463e-05, 'epoch': 1.84}


 61%|██████▏   | 74760/121875 [23:22:40<14:21:43,  1.10s/it]

{'loss': 0.6141, 'learning_rate': 1.932923076923077e-05, 'epoch': 1.84}


 61%|██████▏   | 74790/121875 [23:23:13<14:24:58,  1.10s/it]

{'loss': 0.7968, 'learning_rate': 1.9316923076923078e-05, 'epoch': 1.84}


 61%|██████▏   | 74820/121875 [23:23:46<14:25:10,  1.10s/it]

{'loss': 0.7307, 'learning_rate': 1.9304615384615384e-05, 'epoch': 1.84}


 61%|██████▏   | 74850/121875 [23:24:19<14:21:51,  1.10s/it]

{'loss': 0.7226, 'learning_rate': 1.9292307692307693e-05, 'epoch': 1.84}


 61%|██████▏   | 74880/121875 [23:24:52<14:21:46,  1.10s/it]

{'loss': 0.7209, 'learning_rate': 1.9280000000000002e-05, 'epoch': 1.84}


 61%|██████▏   | 74910/121875 [23:25:25<14:20:50,  1.10s/it]

{'loss': 0.6724, 'learning_rate': 1.9267692307692307e-05, 'epoch': 1.84}


 61%|██████▏   | 74940/121875 [23:25:58<14:23:09,  1.10s/it]

{'loss': 0.755, 'learning_rate': 1.9255384615384617e-05, 'epoch': 1.84}


 62%|██████▏   | 74970/121875 [23:26:31<14:19:33,  1.10s/it]

{'loss': 0.6863, 'learning_rate': 1.9243076923076926e-05, 'epoch': 1.85}


 62%|██████▏   | 75000/121875 [23:27:04<14:18:37,  1.10s/it]

{'loss': 0.6911, 'learning_rate': 1.923076923076923e-05, 'epoch': 1.85}


 62%|██████▏   | 75030/121875 [23:27:39<14:15:46,  1.10s/it]

{'loss': 0.7275, 'learning_rate': 1.921846153846154e-05, 'epoch': 1.85}


 62%|██████▏   | 75060/121875 [23:28:12<14:21:23,  1.10s/it]

{'loss': 0.6563, 'learning_rate': 1.9206153846153846e-05, 'epoch': 1.85}


 62%|██████▏   | 75090/121875 [23:28:45<14:18:59,  1.10s/it]

{'loss': 0.6867, 'learning_rate': 1.9193846153846155e-05, 'epoch': 1.85}


 62%|██████▏   | 75120/121875 [23:29:18<14:17:58,  1.10s/it]

{'loss': 0.649, 'learning_rate': 1.918153846153846e-05, 'epoch': 1.85}


 62%|██████▏   | 75150/121875 [23:29:51<14:19:01,  1.10s/it]

{'loss': 0.6716, 'learning_rate': 1.916923076923077e-05, 'epoch': 1.85}


 62%|██████▏   | 75180/121875 [23:30:24<14:14:36,  1.10s/it]

{'loss': 0.7198, 'learning_rate': 1.915692307692308e-05, 'epoch': 1.85}


 62%|██████▏   | 75210/121875 [23:30:57<14:17:44,  1.10s/it]

{'loss': 0.7082, 'learning_rate': 1.9144615384615385e-05, 'epoch': 1.85}


 62%|██████▏   | 75240/121875 [23:31:30<14:16:21,  1.10s/it]

{'loss': 0.7469, 'learning_rate': 1.9132307692307694e-05, 'epoch': 1.85}


 62%|██████▏   | 75270/121875 [23:32:03<14:14:53,  1.10s/it]

{'loss': 0.7155, 'learning_rate': 1.9120000000000003e-05, 'epoch': 1.85}


 62%|██████▏   | 75300/121875 [23:32:36<14:13:50,  1.10s/it]

{'loss': 0.6535, 'learning_rate': 1.910769230769231e-05, 'epoch': 1.85}


 62%|██████▏   | 75330/121875 [23:33:09<14:08:11,  1.09s/it]

{'loss': 0.6717, 'learning_rate': 1.9095384615384615e-05, 'epoch': 1.85}


 62%|██████▏   | 75360/121875 [23:33:42<14:10:26,  1.10s/it]

{'loss': 0.7053, 'learning_rate': 1.9083076923076924e-05, 'epoch': 1.86}


 62%|██████▏   | 75390/121875 [23:34:15<14:13:56,  1.10s/it]

{'loss': 0.7059, 'learning_rate': 1.907076923076923e-05, 'epoch': 1.86}


 62%|██████▏   | 75420/121875 [23:34:48<14:15:42,  1.11s/it]

{'loss': 0.649, 'learning_rate': 1.905846153846154e-05, 'epoch': 1.86}


 62%|██████▏   | 75450/121875 [23:35:21<14:15:31,  1.11s/it]

{'loss': 0.6887, 'learning_rate': 1.9046153846153848e-05, 'epoch': 1.86}


 62%|██████▏   | 75480/121875 [23:35:54<14:08:08,  1.10s/it]

{'loss': 0.7552, 'learning_rate': 1.9033846153846154e-05, 'epoch': 1.86}


 62%|██████▏   | 75510/121875 [23:36:30<14:30:42,  1.13s/it]

{'loss': 0.7109, 'learning_rate': 1.9021538461538463e-05, 'epoch': 1.86}


 62%|██████▏   | 75540/121875 [23:37:03<14:08:33,  1.10s/it]

{'loss': 0.6773, 'learning_rate': 1.9009230769230772e-05, 'epoch': 1.86}


 62%|██████▏   | 75570/121875 [23:37:36<14:07:35,  1.10s/it]

{'loss': 0.732, 'learning_rate': 1.8996923076923077e-05, 'epoch': 1.86}


 62%|██████▏   | 75600/121875 [23:38:09<14:06:18,  1.10s/it]

{'loss': 0.6401, 'learning_rate': 1.8984615384615387e-05, 'epoch': 1.86}


 62%|██████▏   | 75630/121875 [23:38:42<14:04:40,  1.10s/it]

{'loss': 0.7109, 'learning_rate': 1.8972307692307692e-05, 'epoch': 1.86}


 62%|██████▏   | 75660/121875 [23:39:15<14:06:52,  1.10s/it]

{'loss': 0.6375, 'learning_rate': 1.896e-05, 'epoch': 1.86}


 62%|██████▏   | 75690/121875 [23:39:48<14:07:46,  1.10s/it]

{'loss': 0.7259, 'learning_rate': 1.8947692307692307e-05, 'epoch': 1.86}


 62%|██████▏   | 75720/121875 [23:40:21<14:02:29,  1.10s/it]

{'loss': 0.6792, 'learning_rate': 1.8935384615384616e-05, 'epoch': 1.86}


 62%|██████▏   | 75750/121875 [23:40:54<14:04:36,  1.10s/it]

{'loss': 0.6908, 'learning_rate': 1.8923076923076925e-05, 'epoch': 1.86}


 62%|██████▏   | 75780/121875 [23:41:27<13:55:08,  1.09s/it]

{'loss': 0.6432, 'learning_rate': 1.891076923076923e-05, 'epoch': 1.87}


 62%|██████▏   | 75810/121875 [23:42:00<14:05:50,  1.10s/it]

{'loss': 0.6211, 'learning_rate': 1.889846153846154e-05, 'epoch': 1.87}


 62%|██████▏   | 75840/121875 [23:42:33<13:59:56,  1.09s/it]

{'loss': 0.6941, 'learning_rate': 1.888615384615385e-05, 'epoch': 1.87}


 62%|██████▏   | 75870/121875 [23:43:06<14:01:03,  1.10s/it]

{'loss': 0.7021, 'learning_rate': 1.8873846153846155e-05, 'epoch': 1.87}


 62%|██████▏   | 75900/121875 [23:43:39<14:02:11,  1.10s/it]

{'loss': 0.6439, 'learning_rate': 1.8861538461538464e-05, 'epoch': 1.87}


 62%|██████▏   | 75930/121875 [23:44:12<14:00:39,  1.10s/it]

{'loss': 0.6296, 'learning_rate': 1.884923076923077e-05, 'epoch': 1.87}


 62%|██████▏   | 75960/121875 [23:44:45<14:01:45,  1.10s/it]

{'loss': 0.6717, 'learning_rate': 1.8836923076923076e-05, 'epoch': 1.87}


 62%|██████▏   | 75990/121875 [23:45:18<14:01:41,  1.10s/it]

{'loss': 0.7018, 'learning_rate': 1.8824615384615385e-05, 'epoch': 1.87}


 62%|██████▏   | 76020/121875 [23:45:53<14:01:17,  1.10s/it]

{'loss': 0.7755, 'learning_rate': 1.8812307692307694e-05, 'epoch': 1.87}


 62%|██████▏   | 76050/121875 [23:46:26<14:00:38,  1.10s/it]

{'loss': 0.7031, 'learning_rate': 1.88e-05, 'epoch': 1.87}


 62%|██████▏   | 76080/121875 [23:46:59<14:02:37,  1.10s/it]

{'loss': 0.7343, 'learning_rate': 1.878769230769231e-05, 'epoch': 1.87}


 62%|██████▏   | 76110/121875 [23:47:32<13:55:00,  1.09s/it]

{'loss': 0.7667, 'learning_rate': 1.8775384615384618e-05, 'epoch': 1.87}


 62%|██████▏   | 76140/121875 [23:48:05<13:56:40,  1.10s/it]

{'loss': 0.6505, 'learning_rate': 1.8763076923076924e-05, 'epoch': 1.87}


 62%|██████▏   | 76170/121875 [23:48:38<13:55:42,  1.10s/it]

{'loss': 0.7451, 'learning_rate': 1.8750769230769233e-05, 'epoch': 1.87}


 63%|██████▎   | 76200/121875 [23:49:11<13:54:40,  1.10s/it]

{'loss': 0.6687, 'learning_rate': 1.873846153846154e-05, 'epoch': 1.88}


 63%|██████▎   | 76230/121875 [23:49:44<13:57:07,  1.10s/it]

{'loss': 0.7724, 'learning_rate': 1.8726153846153847e-05, 'epoch': 1.88}


 63%|██████▎   | 76260/121875 [23:50:17<13:55:12,  1.10s/it]

{'loss': 0.6836, 'learning_rate': 1.8713846153846153e-05, 'epoch': 1.88}


 63%|██████▎   | 76290/121875 [23:50:50<13:55:20,  1.10s/it]

{'loss': 0.6532, 'learning_rate': 1.8701538461538462e-05, 'epoch': 1.88}


 63%|██████▎   | 76320/121875 [23:51:23<13:53:09,  1.10s/it]

{'loss': 0.6611, 'learning_rate': 1.868923076923077e-05, 'epoch': 1.88}


 63%|██████▎   | 76350/121875 [23:51:56<13:48:44,  1.09s/it]

{'loss': 0.6673, 'learning_rate': 1.8676923076923077e-05, 'epoch': 1.88}


 63%|██████▎   | 76380/121875 [23:52:29<13:55:25,  1.10s/it]

{'loss': 0.6616, 'learning_rate': 1.8664615384615386e-05, 'epoch': 1.88}


 63%|██████▎   | 76410/121875 [23:53:02<13:49:22,  1.09s/it]

{'loss': 0.698, 'learning_rate': 1.8652307692307695e-05, 'epoch': 1.88}


 63%|██████▎   | 76440/121875 [23:53:35<13:51:22,  1.10s/it]

{'loss': 0.7339, 'learning_rate': 1.864e-05, 'epoch': 1.88}


 63%|██████▎   | 76470/121875 [23:54:08<13:53:46,  1.10s/it]

{'loss': 0.6566, 'learning_rate': 1.862769230769231e-05, 'epoch': 1.88}


 63%|██████▎   | 76500/121875 [23:54:41<13:57:01,  1.11s/it]

{'loss': 0.7335, 'learning_rate': 1.8615384615384616e-05, 'epoch': 1.88}


 63%|██████▎   | 76530/121875 [23:55:16<13:53:36,  1.10s/it]

{'loss': 0.7313, 'learning_rate': 1.860307692307692e-05, 'epoch': 1.88}


 63%|██████▎   | 76560/121875 [23:55:49<13:48:33,  1.10s/it]

{'loss': 0.6799, 'learning_rate': 1.859076923076923e-05, 'epoch': 1.88}


 63%|██████▎   | 76590/121875 [23:56:22<13:46:56,  1.10s/it]

{'loss': 0.6311, 'learning_rate': 1.857846153846154e-05, 'epoch': 1.89}


 63%|██████▎   | 76620/121875 [23:56:55<13:45:56,  1.10s/it]

{'loss': 0.7222, 'learning_rate': 1.8566153846153846e-05, 'epoch': 1.89}


 63%|██████▎   | 76650/121875 [23:57:28<13:49:09,  1.10s/it]

{'loss': 0.7122, 'learning_rate': 1.8553846153846155e-05, 'epoch': 1.89}


 63%|██████▎   | 76680/121875 [23:58:01<13:51:32,  1.10s/it]

{'loss': 0.6366, 'learning_rate': 1.8541538461538464e-05, 'epoch': 1.89}


 63%|██████▎   | 76710/121875 [23:58:34<13:49:30,  1.10s/it]

{'loss': 0.7459, 'learning_rate': 1.852923076923077e-05, 'epoch': 1.89}


 63%|██████▎   | 76740/121875 [23:59:07<13:47:37,  1.10s/it]

{'loss': 0.7222, 'learning_rate': 1.851692307692308e-05, 'epoch': 1.89}


 63%|██████▎   | 76770/121875 [23:59:40<13:46:49,  1.10s/it]

{'loss': 0.6966, 'learning_rate': 1.8504615384615384e-05, 'epoch': 1.89}


 63%|██████▎   | 76800/121875 [24:00:13<13:40:28,  1.09s/it]

{'loss': 0.6497, 'learning_rate': 1.8492307692307694e-05, 'epoch': 1.89}


 63%|██████▎   | 76830/121875 [24:00:46<13:48:26,  1.10s/it]

{'loss': 0.6611, 'learning_rate': 1.848e-05, 'epoch': 1.89}


 63%|██████▎   | 76860/121875 [24:01:19<13:46:52,  1.10s/it]

{'loss': 0.7061, 'learning_rate': 1.846769230769231e-05, 'epoch': 1.89}


 63%|██████▎   | 76890/121875 [24:01:52<13:42:48,  1.10s/it]

{'loss': 0.7193, 'learning_rate': 1.8455384615384617e-05, 'epoch': 1.89}


 63%|██████▎   | 76920/121875 [24:02:25<13:48:11,  1.11s/it]

{'loss': 0.7078, 'learning_rate': 1.8443076923076923e-05, 'epoch': 1.89}


 63%|██████▎   | 76950/121875 [24:02:58<13:43:55,  1.10s/it]

{'loss': 0.7029, 'learning_rate': 1.8430769230769232e-05, 'epoch': 1.89}


 63%|██████▎   | 76980/121875 [24:03:31<13:42:14,  1.10s/it]

{'loss': 0.7461, 'learning_rate': 1.841846153846154e-05, 'epoch': 1.89}


 63%|██████▎   | 77010/121875 [24:04:06<13:58:46,  1.12s/it]

{'loss': 0.6883, 'learning_rate': 1.8406153846153847e-05, 'epoch': 1.9}


 63%|██████▎   | 77040/121875 [24:04:39<13:41:02,  1.10s/it]

{'loss': 0.69, 'learning_rate': 1.8393846153846156e-05, 'epoch': 1.9}


 63%|██████▎   | 77070/121875 [24:05:12<13:43:14,  1.10s/it]

{'loss': 0.7265, 'learning_rate': 1.8381538461538462e-05, 'epoch': 1.9}


 63%|██████▎   | 77100/121875 [24:05:45<13:39:05,  1.10s/it]

{'loss': 0.6171, 'learning_rate': 1.8369230769230768e-05, 'epoch': 1.9}


 63%|██████▎   | 77130/121875 [24:06:18<13:36:39,  1.10s/it]

{'loss': 0.7341, 'learning_rate': 1.8356923076923077e-05, 'epoch': 1.9}


 63%|██████▎   | 77160/121875 [24:06:51<13:39:08,  1.10s/it]

{'loss': 0.6523, 'learning_rate': 1.8344615384615386e-05, 'epoch': 1.9}


 63%|██████▎   | 77190/121875 [24:07:24<13:36:43,  1.10s/it]

{'loss': 0.6964, 'learning_rate': 1.833230769230769e-05, 'epoch': 1.9}


 63%|██████▎   | 77220/121875 [24:07:57<13:37:59,  1.10s/it]

{'loss': 0.7229, 'learning_rate': 1.832e-05, 'epoch': 1.9}


 63%|██████▎   | 77250/121875 [24:08:30<13:39:51,  1.10s/it]

{'loss': 0.6298, 'learning_rate': 1.830769230769231e-05, 'epoch': 1.9}


 63%|██████▎   | 77280/121875 [24:09:03<13:36:08,  1.10s/it]

{'loss': 0.651, 'learning_rate': 1.8295384615384616e-05, 'epoch': 1.9}


 63%|██████▎   | 77310/121875 [24:09:36<13:36:55,  1.10s/it]

{'loss': 0.7229, 'learning_rate': 1.8283076923076925e-05, 'epoch': 1.9}


 63%|██████▎   | 77340/121875 [24:10:09<13:38:00,  1.10s/it]

{'loss': 0.6289, 'learning_rate': 1.8270769230769234e-05, 'epoch': 1.9}


 63%|██████▎   | 77370/121875 [24:10:42<13:36:29,  1.10s/it]

{'loss': 0.7345, 'learning_rate': 1.825846153846154e-05, 'epoch': 1.9}


 64%|██████▎   | 77400/121875 [24:11:15<13:35:52,  1.10s/it]

{'loss': 0.6541, 'learning_rate': 1.8246153846153845e-05, 'epoch': 1.91}


 64%|██████▎   | 77430/121875 [24:11:48<13:33:47,  1.10s/it]

{'loss': 0.7387, 'learning_rate': 1.8233846153846154e-05, 'epoch': 1.91}


 64%|██████▎   | 77460/121875 [24:12:21<13:34:27,  1.10s/it]

{'loss': 0.6979, 'learning_rate': 1.8221538461538464e-05, 'epoch': 1.91}


 64%|██████▎   | 77490/121875 [24:12:54<13:32:55,  1.10s/it]

{'loss': 0.756, 'learning_rate': 1.820923076923077e-05, 'epoch': 1.91}


 64%|██████▎   | 77520/121875 [24:13:29<13:36:02,  1.10s/it]

{'loss': 0.742, 'learning_rate': 1.819692307692308e-05, 'epoch': 1.91}


 64%|██████▎   | 77550/121875 [24:14:02<13:30:57,  1.10s/it]

{'loss': 0.6988, 'learning_rate': 1.8184615384615387e-05, 'epoch': 1.91}


 64%|██████▎   | 77580/121875 [24:14:35<13:31:06,  1.10s/it]

{'loss': 0.7132, 'learning_rate': 1.8172307692307693e-05, 'epoch': 1.91}


 64%|██████▎   | 77610/121875 [24:15:08<13:31:57,  1.10s/it]

{'loss': 0.6912, 'learning_rate': 1.8160000000000002e-05, 'epoch': 1.91}


 64%|██████▎   | 77640/121875 [24:15:41<13:30:12,  1.10s/it]

{'loss': 0.7043, 'learning_rate': 1.8147692307692308e-05, 'epoch': 1.91}


 64%|██████▎   | 77670/121875 [24:16:14<13:30:10,  1.10s/it]

{'loss': 0.7371, 'learning_rate': 1.8135384615384614e-05, 'epoch': 1.91}


 64%|██████▍   | 77700/121875 [24:16:47<13:31:42,  1.10s/it]

{'loss': 0.6719, 'learning_rate': 1.8123076923076923e-05, 'epoch': 1.91}


 64%|██████▍   | 77730/121875 [24:17:20<13:28:16,  1.10s/it]

{'loss': 0.7115, 'learning_rate': 1.8110769230769232e-05, 'epoch': 1.91}


 64%|██████▍   | 77760/121875 [24:17:53<13:27:53,  1.10s/it]

{'loss': 0.7161, 'learning_rate': 1.8098461538461538e-05, 'epoch': 1.91}


 64%|██████▍   | 77790/121875 [24:18:26<13:30:01,  1.10s/it]

{'loss': 0.6683, 'learning_rate': 1.8086153846153847e-05, 'epoch': 1.91}


 64%|██████▍   | 77820/121875 [24:18:59<13:22:53,  1.09s/it]

{'loss': 0.7168, 'learning_rate': 1.8073846153846156e-05, 'epoch': 1.92}


 64%|██████▍   | 77850/121875 [24:19:32<13:31:00,  1.11s/it]

{'loss': 0.609, 'learning_rate': 1.806153846153846e-05, 'epoch': 1.92}


 64%|██████▍   | 77880/121875 [24:20:05<13:26:13,  1.10s/it]

{'loss': 0.6183, 'learning_rate': 1.804923076923077e-05, 'epoch': 1.92}


 64%|██████▍   | 77910/121875 [24:20:38<13:23:18,  1.10s/it]

{'loss': 0.6229, 'learning_rate': 1.803692307692308e-05, 'epoch': 1.92}


 64%|██████▍   | 77940/121875 [24:21:11<13:27:24,  1.10s/it]

{'loss': 0.7045, 'learning_rate': 1.8024615384615386e-05, 'epoch': 1.92}


 64%|██████▍   | 77970/121875 [24:21:44<13:25:55,  1.10s/it]

{'loss': 0.7671, 'learning_rate': 1.801230769230769e-05, 'epoch': 1.92}


 64%|██████▍   | 78000/121875 [24:22:17<13:24:49,  1.10s/it]

{'loss': 0.6753, 'learning_rate': 1.8e-05, 'epoch': 1.92}


 64%|██████▍   | 78030/121875 [24:22:53<13:24:15,  1.10s/it]

{'loss': 0.6778, 'learning_rate': 1.798769230769231e-05, 'epoch': 1.92}


 64%|██████▍   | 78060/121875 [24:23:26<13:22:45,  1.10s/it]

{'loss': 0.6599, 'learning_rate': 1.7975384615384615e-05, 'epoch': 1.92}


 64%|██████▍   | 78090/121875 [24:23:59<13:24:33,  1.10s/it]

{'loss': 0.6249, 'learning_rate': 1.7963076923076924e-05, 'epoch': 1.92}


 64%|██████▍   | 78120/121875 [24:24:31<13:17:41,  1.09s/it]

{'loss': 0.6422, 'learning_rate': 1.7950769230769234e-05, 'epoch': 1.92}


 64%|██████▍   | 78150/121875 [24:25:05<13:22:52,  1.10s/it]

{'loss': 0.6949, 'learning_rate': 1.793846153846154e-05, 'epoch': 1.92}


 64%|██████▍   | 78180/121875 [24:25:37<13:24:26,  1.10s/it]

{'loss': 0.6976, 'learning_rate': 1.792615384615385e-05, 'epoch': 1.92}


 64%|██████▍   | 78210/121875 [24:26:11<13:22:32,  1.10s/it]

{'loss': 0.7261, 'learning_rate': 1.7913846153846157e-05, 'epoch': 1.93}


 64%|██████▍   | 78240/121875 [24:26:43<13:19:25,  1.10s/it]

{'loss': 0.6767, 'learning_rate': 1.790153846153846e-05, 'epoch': 1.93}


 64%|██████▍   | 78270/121875 [24:27:16<13:14:08,  1.09s/it]

{'loss': 0.7259, 'learning_rate': 1.788923076923077e-05, 'epoch': 1.93}


 64%|██████▍   | 78300/121875 [24:27:50<13:19:08,  1.10s/it]

{'loss': 0.6601, 'learning_rate': 1.7876923076923078e-05, 'epoch': 1.93}


 64%|██████▍   | 78330/121875 [24:28:22<13:15:24,  1.10s/it]

{'loss': 0.6634, 'learning_rate': 1.7864615384615384e-05, 'epoch': 1.93}


 64%|██████▍   | 78360/121875 [24:28:55<13:16:20,  1.10s/it]

{'loss': 0.7252, 'learning_rate': 1.7852307692307693e-05, 'epoch': 1.93}


 64%|██████▍   | 78390/121875 [24:29:29<13:20:16,  1.10s/it]

{'loss': 0.6788, 'learning_rate': 1.7840000000000002e-05, 'epoch': 1.93}


 64%|██████▍   | 78420/121875 [24:30:01<13:17:05,  1.10s/it]

{'loss': 0.6561, 'learning_rate': 1.7827692307692308e-05, 'epoch': 1.93}


 64%|██████▍   | 78450/121875 [24:30:34<13:17:08,  1.10s/it]

{'loss': 0.6691, 'learning_rate': 1.7815384615384617e-05, 'epoch': 1.93}


 64%|██████▍   | 78480/121875 [24:31:07<13:15:01,  1.10s/it]

{'loss': 0.6424, 'learning_rate': 1.7803076923076926e-05, 'epoch': 1.93}


 64%|██████▍   | 78510/121875 [24:31:43<13:35:34,  1.13s/it]

{'loss': 0.7299, 'learning_rate': 1.779076923076923e-05, 'epoch': 1.93}


 64%|██████▍   | 78540/121875 [24:32:16<13:16:01,  1.10s/it]

{'loss': 0.7265, 'learning_rate': 1.7778461538461537e-05, 'epoch': 1.93}


 64%|██████▍   | 78570/121875 [24:32:49<13:13:23,  1.10s/it]

{'loss': 0.7112, 'learning_rate': 1.7766153846153847e-05, 'epoch': 1.93}


 64%|██████▍   | 78600/121875 [24:33:22<13:16:08,  1.10s/it]

{'loss': 0.6959, 'learning_rate': 1.7753846153846156e-05, 'epoch': 1.93}


 65%|██████▍   | 78630/121875 [24:33:55<13:08:14,  1.09s/it]

{'loss': 0.7239, 'learning_rate': 1.774153846153846e-05, 'epoch': 1.94}


 65%|██████▍   | 78660/121875 [24:34:28<13:11:27,  1.10s/it]

{'loss': 0.6921, 'learning_rate': 1.772923076923077e-05, 'epoch': 1.94}


 65%|██████▍   | 78690/121875 [24:35:01<13:10:33,  1.10s/it]

{'loss': 0.6765, 'learning_rate': 1.771692307692308e-05, 'epoch': 1.94}


 65%|██████▍   | 78720/121875 [24:35:34<13:14:43,  1.10s/it]

{'loss': 0.708, 'learning_rate': 1.7704615384615385e-05, 'epoch': 1.94}


 65%|██████▍   | 78750/121875 [24:36:07<13:11:18,  1.10s/it]

{'loss': 0.6931, 'learning_rate': 1.7692307692307694e-05, 'epoch': 1.94}


 65%|██████▍   | 78780/121875 [24:36:40<13:14:53,  1.11s/it]

{'loss': 0.6742, 'learning_rate': 1.7680000000000004e-05, 'epoch': 1.94}


 65%|██████▍   | 78810/121875 [24:37:13<13:10:02,  1.10s/it]

{'loss': 0.72, 'learning_rate': 1.7667692307692306e-05, 'epoch': 1.94}


 65%|██████▍   | 78840/121875 [24:37:46<13:08:44,  1.10s/it]

{'loss': 0.7215, 'learning_rate': 1.7655384615384615e-05, 'epoch': 1.94}


 65%|██████▍   | 78870/121875 [24:38:19<13:07:15,  1.10s/it]

{'loss': 0.6558, 'learning_rate': 1.7643076923076924e-05, 'epoch': 1.94}


 65%|██████▍   | 78900/121875 [24:38:52<13:09:00,  1.10s/it]

{'loss': 0.6843, 'learning_rate': 1.763076923076923e-05, 'epoch': 1.94}


 65%|██████▍   | 78930/121875 [24:39:25<13:08:57,  1.10s/it]

{'loss': 0.7321, 'learning_rate': 1.761846153846154e-05, 'epoch': 1.94}


 65%|██████▍   | 78960/121875 [24:39:58<13:09:05,  1.10s/it]

{'loss': 0.7328, 'learning_rate': 1.7606153846153848e-05, 'epoch': 1.94}


 65%|██████▍   | 78990/121875 [24:40:31<13:07:47,  1.10s/it]

{'loss': 0.7128, 'learning_rate': 1.7593846153846154e-05, 'epoch': 1.94}


 65%|██████▍   | 79020/121875 [24:41:06<13:06:18,  1.10s/it]

{'loss': 0.7236, 'learning_rate': 1.7581538461538463e-05, 'epoch': 1.95}


 65%|██████▍   | 79050/121875 [24:41:39<13:07:19,  1.10s/it]

{'loss': 0.689, 'learning_rate': 1.7569230769230772e-05, 'epoch': 1.95}


 65%|██████▍   | 79080/121875 [24:42:12<13:04:13,  1.10s/it]

{'loss': 0.718, 'learning_rate': 1.7556923076923078e-05, 'epoch': 1.95}


 65%|██████▍   | 79110/121875 [24:42:45<12:59:42,  1.09s/it]

{'loss': 0.7395, 'learning_rate': 1.7544615384615383e-05, 'epoch': 1.95}


 65%|██████▍   | 79140/121875 [24:43:18<13:05:10,  1.10s/it]

{'loss': 0.6889, 'learning_rate': 1.7532307692307693e-05, 'epoch': 1.95}


 65%|██████▍   | 79170/121875 [24:43:51<13:03:58,  1.10s/it]

{'loss': 0.7666, 'learning_rate': 1.752e-05, 'epoch': 1.95}


 65%|██████▍   | 79200/121875 [24:44:24<13:03:04,  1.10s/it]

{'loss': 0.6748, 'learning_rate': 1.7507692307692307e-05, 'epoch': 1.95}


 65%|██████▌   | 79230/121875 [24:44:57<13:10:25,  1.11s/it]

{'loss': 0.7325, 'learning_rate': 1.7495384615384617e-05, 'epoch': 1.95}


 65%|██████▌   | 79260/121875 [24:45:31<13:29:11,  1.14s/it]

{'loss': 0.7149, 'learning_rate': 1.7483076923076926e-05, 'epoch': 1.95}


 65%|██████▌   | 79290/121875 [24:46:05<13:23:48,  1.13s/it]

{'loss': 0.6619, 'learning_rate': 1.747076923076923e-05, 'epoch': 1.95}


 65%|██████▌   | 79320/121875 [24:46:39<13:19:29,  1.13s/it]

{'loss': 0.6474, 'learning_rate': 1.745846153846154e-05, 'epoch': 1.95}


 65%|██████▌   | 79350/121875 [24:47:13<13:26:23,  1.14s/it]

{'loss': 0.6718, 'learning_rate': 1.744615384615385e-05, 'epoch': 1.95}


 65%|██████▌   | 79380/121875 [24:47:47<13:23:38,  1.13s/it]

{'loss': 0.6803, 'learning_rate': 1.7433846153846152e-05, 'epoch': 1.95}


 65%|██████▌   | 79410/121875 [24:48:21<13:24:23,  1.14s/it]

{'loss': 0.6421, 'learning_rate': 1.742153846153846e-05, 'epoch': 1.95}


 65%|██████▌   | 79440/121875 [24:48:55<12:56:41,  1.10s/it]

{'loss': 0.6951, 'learning_rate': 1.740923076923077e-05, 'epoch': 1.96}


 65%|██████▌   | 79470/121875 [24:49:29<13:12:42,  1.12s/it]

{'loss': 0.6493, 'learning_rate': 1.7396923076923076e-05, 'epoch': 1.96}


 65%|██████▌   | 79500/121875 [24:50:02<12:59:41,  1.10s/it]

{'loss': 0.6563, 'learning_rate': 1.7384615384615385e-05, 'epoch': 1.96}


 65%|██████▌   | 79530/121875 [24:50:37<12:44:12,  1.08s/it]

{'loss': 0.7401, 'learning_rate': 1.7372307692307694e-05, 'epoch': 1.96}


 65%|██████▌   | 79560/121875 [24:51:10<13:21:14,  1.14s/it]

{'loss': 0.6514, 'learning_rate': 1.736e-05, 'epoch': 1.96}


 65%|██████▌   | 79590/121875 [24:51:44<12:55:22,  1.10s/it]

{'loss': 0.6739, 'learning_rate': 1.734769230769231e-05, 'epoch': 1.96}


 65%|██████▌   | 79620/121875 [24:52:17<12:52:38,  1.10s/it]

{'loss': 0.6159, 'learning_rate': 1.7335384615384618e-05, 'epoch': 1.96}


 65%|██████▌   | 79650/121875 [24:52:50<12:48:25,  1.09s/it]

{'loss': 0.7394, 'learning_rate': 1.7323076923076924e-05, 'epoch': 1.96}


 65%|██████▌   | 79680/121875 [24:53:23<12:48:07,  1.09s/it]

{'loss': 0.6961, 'learning_rate': 1.731076923076923e-05, 'epoch': 1.96}


 65%|██████▌   | 79710/121875 [24:53:55<12:43:02,  1.09s/it]

{'loss': 0.7688, 'learning_rate': 1.729846153846154e-05, 'epoch': 1.96}


 65%|██████▌   | 79740/121875 [24:54:28<12:41:43,  1.08s/it]

{'loss': 0.7193, 'learning_rate': 1.7286153846153848e-05, 'epoch': 1.96}


 65%|██████▌   | 79770/121875 [24:55:00<12:42:26,  1.09s/it]

{'loss': 0.6469, 'learning_rate': 1.7273846153846153e-05, 'epoch': 1.96}


 65%|██████▌   | 79800/121875 [24:55:33<12:52:55,  1.10s/it]

{'loss': 0.6807, 'learning_rate': 1.7261538461538463e-05, 'epoch': 1.96}


 66%|██████▌   | 79830/121875 [24:56:06<12:48:42,  1.10s/it]

{'loss': 0.7382, 'learning_rate': 1.724923076923077e-05, 'epoch': 1.97}


 66%|██████▌   | 79860/121875 [24:56:39<12:41:49,  1.09s/it]

{'loss': 0.7295, 'learning_rate': 1.7236923076923077e-05, 'epoch': 1.97}


 66%|██████▌   | 79890/121875 [24:57:12<12:46:34,  1.10s/it]

{'loss': 0.6461, 'learning_rate': 1.7224615384615387e-05, 'epoch': 1.97}


 66%|██████▌   | 79920/121875 [24:57:45<13:13:22,  1.13s/it]

{'loss': 0.6908, 'learning_rate': 1.7212307692307696e-05, 'epoch': 1.97}


 66%|██████▌   | 79950/121875 [24:58:19<13:13:15,  1.14s/it]

{'loss': 0.6814, 'learning_rate': 1.7199999999999998e-05, 'epoch': 1.97}


 66%|██████▌   | 79980/121875 [24:58:53<13:04:07,  1.12s/it]

{'loss': 0.6663, 'learning_rate': 1.7187692307692307e-05, 'epoch': 1.97}


 66%|██████▌   | 80010/121875 [24:59:29<13:24:52,  1.15s/it]

{'loss': 0.6844, 'learning_rate': 1.7175384615384616e-05, 'epoch': 1.97}


 66%|██████▌   | 80040/121875 [25:00:03<13:06:30,  1.13s/it]

{'loss': 0.6478, 'learning_rate': 1.7163076923076922e-05, 'epoch': 1.97}


 66%|██████▌   | 80070/121875 [25:00:37<13:03:20,  1.12s/it]

{'loss': 0.7461, 'learning_rate': 1.715076923076923e-05, 'epoch': 1.97}


 66%|██████▌   | 80100/121875 [25:01:10<13:05:02,  1.13s/it]

{'loss': 0.6674, 'learning_rate': 1.713846153846154e-05, 'epoch': 1.97}


 66%|██████▌   | 80130/121875 [25:01:44<13:09:16,  1.13s/it]

{'loss': 0.6866, 'learning_rate': 1.7126153846153846e-05, 'epoch': 1.97}


 66%|██████▌   | 80160/121875 [25:02:18<13:08:14,  1.13s/it]

{'loss': 0.6751, 'learning_rate': 1.7113846153846155e-05, 'epoch': 1.97}


 66%|██████▌   | 80190/121875 [25:02:52<12:55:24,  1.12s/it]

{'loss': 0.6831, 'learning_rate': 1.7101538461538464e-05, 'epoch': 1.97}


 66%|██████▌   | 80220/121875 [25:03:26<13:05:24,  1.13s/it]

{'loss': 0.6935, 'learning_rate': 1.708923076923077e-05, 'epoch': 1.97}


 66%|██████▌   | 80250/121875 [25:04:00<13:05:11,  1.13s/it]

{'loss': 0.7495, 'learning_rate': 1.7076923076923076e-05, 'epoch': 1.98}


 66%|██████▌   | 80280/121875 [25:04:34<13:02:15,  1.13s/it]

{'loss': 0.714, 'learning_rate': 1.7064615384615385e-05, 'epoch': 1.98}


 66%|██████▌   | 80310/121875 [25:05:08<13:04:46,  1.13s/it]

{'loss': 0.6925, 'learning_rate': 1.7052307692307694e-05, 'epoch': 1.98}


 66%|██████▌   | 80340/121875 [25:05:41<12:58:50,  1.13s/it]

{'loss': 0.6161, 'learning_rate': 1.704e-05, 'epoch': 1.98}


 66%|██████▌   | 80370/121875 [25:06:15<13:04:18,  1.13s/it]

{'loss': 0.6492, 'learning_rate': 1.702769230769231e-05, 'epoch': 1.98}


 66%|██████▌   | 80400/121875 [25:06:49<12:59:45,  1.13s/it]

{'loss': 0.637, 'learning_rate': 1.7015384615384618e-05, 'epoch': 1.98}


 66%|██████▌   | 80430/121875 [25:07:23<12:55:12,  1.12s/it]

{'loss': 0.7051, 'learning_rate': 1.7003076923076923e-05, 'epoch': 1.98}


 66%|██████▌   | 80460/121875 [25:07:57<12:58:05,  1.13s/it]

{'loss': 0.7509, 'learning_rate': 1.6990769230769233e-05, 'epoch': 1.98}


 66%|██████▌   | 80490/121875 [25:08:31<13:00:26,  1.13s/it]

{'loss': 0.6607, 'learning_rate': 1.697846153846154e-05, 'epoch': 1.98}


 66%|██████▌   | 80520/121875 [25:09:07<12:59:10,  1.13s/it]

{'loss': 0.6806, 'learning_rate': 1.6966153846153847e-05, 'epoch': 1.98}


 66%|██████▌   | 80550/121875 [25:09:41<13:01:41,  1.13s/it]

{'loss': 0.7021, 'learning_rate': 1.6953846153846153e-05, 'epoch': 1.98}


 66%|██████▌   | 80580/121875 [25:10:15<12:34:19,  1.10s/it]

{'loss': 0.6506, 'learning_rate': 1.6941538461538462e-05, 'epoch': 1.98}


 66%|██████▌   | 80610/121875 [25:10:47<12:24:46,  1.08s/it]

{'loss': 0.6729, 'learning_rate': 1.6929230769230768e-05, 'epoch': 1.98}


 66%|██████▌   | 80640/121875 [25:11:19<12:18:58,  1.08s/it]

{'loss': 0.6497, 'learning_rate': 1.6916923076923077e-05, 'epoch': 1.98}


 66%|██████▌   | 80670/121875 [25:11:52<12:21:22,  1.08s/it]

{'loss': 0.6981, 'learning_rate': 1.6904615384615386e-05, 'epoch': 1.99}


 66%|██████▌   | 80700/121875 [25:12:24<12:20:43,  1.08s/it]

{'loss': 0.7575, 'learning_rate': 1.6892307692307692e-05, 'epoch': 1.99}


 66%|██████▌   | 80730/121875 [25:12:56<12:20:13,  1.08s/it]

{'loss': 0.6483, 'learning_rate': 1.688e-05, 'epoch': 1.99}


 66%|██████▋   | 80760/121875 [25:13:29<12:17:43,  1.08s/it]

{'loss': 0.7202, 'learning_rate': 1.686769230769231e-05, 'epoch': 1.99}


 66%|██████▋   | 80790/121875 [25:14:01<12:13:10,  1.07s/it]

{'loss': 0.6986, 'learning_rate': 1.6855384615384616e-05, 'epoch': 1.99}


 66%|██████▋   | 80820/121875 [25:14:33<12:10:55,  1.07s/it]

{'loss': 0.6859, 'learning_rate': 1.684307692307692e-05, 'epoch': 1.99}


 66%|██████▋   | 80850/121875 [25:15:05<12:19:52,  1.08s/it]

{'loss': 0.7375, 'learning_rate': 1.683076923076923e-05, 'epoch': 1.99}


 66%|██████▋   | 80880/121875 [25:15:38<12:18:23,  1.08s/it]

{'loss': 0.7004, 'learning_rate': 1.681846153846154e-05, 'epoch': 1.99}


 66%|██████▋   | 80910/121875 [25:16:10<12:12:03,  1.07s/it]

{'loss': 0.7197, 'learning_rate': 1.6806153846153846e-05, 'epoch': 1.99}


 66%|██████▋   | 80940/121875 [25:16:42<12:16:55,  1.08s/it]

{'loss': 0.7307, 'learning_rate': 1.6793846153846155e-05, 'epoch': 1.99}


 66%|██████▋   | 80970/121875 [25:17:15<12:14:44,  1.08s/it]

{'loss': 0.7424, 'learning_rate': 1.6781538461538464e-05, 'epoch': 1.99}


 66%|██████▋   | 81000/121875 [25:17:47<12:14:34,  1.08s/it]

{'loss': 0.7472, 'learning_rate': 1.676923076923077e-05, 'epoch': 1.99}


 66%|██████▋   | 81030/121875 [25:18:22<12:12:45,  1.08s/it]

{'loss': 0.7095, 'learning_rate': 1.675692307692308e-05, 'epoch': 1.99}


 67%|██████▋   | 81060/121875 [25:18:54<12:16:10,  1.08s/it]

{'loss': 0.6599, 'learning_rate': 1.6744615384615388e-05, 'epoch': 2.0}


 67%|██████▋   | 81090/121875 [25:19:27<12:15:59,  1.08s/it]

{'loss': 0.7442, 'learning_rate': 1.6732307692307693e-05, 'epoch': 2.0}


 67%|██████▋   | 81120/121875 [25:19:59<12:12:37,  1.08s/it]

{'loss': 0.6882, 'learning_rate': 1.672e-05, 'epoch': 2.0}


 67%|██████▋   | 81150/121875 [25:20:31<12:12:40,  1.08s/it]

{'loss': 0.6117, 'learning_rate': 1.6707692307692308e-05, 'epoch': 2.0}


 67%|██████▋   | 81180/121875 [25:21:04<12:08:37,  1.07s/it]

{'loss': 0.7143, 'learning_rate': 1.6695384615384614e-05, 'epoch': 2.0}


 67%|██████▋   | 81210/121875 [25:21:36<12:09:52,  1.08s/it]

{'loss': 0.6715, 'learning_rate': 1.6683076923076923e-05, 'epoch': 2.0}


 67%|██████▋   | 81240/121875 [25:22:09<12:08:57,  1.08s/it]

{'loss': 0.7125, 'learning_rate': 1.6670769230769232e-05, 'epoch': 2.0}


                                                            
 67%|██████▋   | 81250/121875 [25:38:51<12:11:30,  1.08s/it]

{'eval_loss': 0.7221324443817139, 'eval_accuracy': 0.68482, 'eval_runtime': 991.3631, 'eval_samples_per_second': 50.436, 'eval_steps_per_second': 6.304, 'epoch': 2.0}


 67%|██████▋   | 81270/121875 [25:39:13<16:09:32,  1.43s/it]   

{'loss': 0.654, 'learning_rate': 1.6658461538461538e-05, 'epoch': 2.0}


 67%|██████▋   | 81300/121875 [25:39:45<12:07:13,  1.08s/it]

{'loss': 0.5895, 'learning_rate': 1.6646153846153847e-05, 'epoch': 2.0}


 67%|██████▋   | 81330/121875 [25:40:17<12:07:32,  1.08s/it]

{'loss': 0.6165, 'learning_rate': 1.6633846153846156e-05, 'epoch': 2.0}


 67%|██████▋   | 81360/121875 [25:40:50<12:07:07,  1.08s/it]

{'loss': 0.6414, 'learning_rate': 1.6621538461538462e-05, 'epoch': 2.0}


 67%|██████▋   | 81390/121875 [25:41:22<12:04:45,  1.07s/it]

{'loss': 0.7083, 'learning_rate': 1.660923076923077e-05, 'epoch': 2.0}


 67%|██████▋   | 81420/121875 [25:41:55<12:07:03,  1.08s/it]

{'loss': 0.6385, 'learning_rate': 1.6596923076923077e-05, 'epoch': 2.0}


 67%|██████▋   | 81450/121875 [25:42:27<12:09:23,  1.08s/it]

{'loss': 0.6113, 'learning_rate': 1.6584615384615386e-05, 'epoch': 2.0}


 67%|██████▋   | 81480/121875 [25:42:59<12:06:19,  1.08s/it]

{'loss': 0.6543, 'learning_rate': 1.657230769230769e-05, 'epoch': 2.01}


 67%|██████▋   | 81510/121875 [25:43:34<12:22:26,  1.10s/it]

{'loss': 0.6214, 'learning_rate': 1.656e-05, 'epoch': 2.01}


 67%|██████▋   | 81540/121875 [25:44:06<12:01:10,  1.07s/it]

{'loss': 0.6093, 'learning_rate': 1.654769230769231e-05, 'epoch': 2.01}


 67%|██████▋   | 81570/121875 [25:44:39<12:00:17,  1.07s/it]

{'loss': 0.6062, 'learning_rate': 1.6535384615384616e-05, 'epoch': 2.01}


 67%|██████▋   | 81600/121875 [25:45:11<12:03:33,  1.08s/it]

{'loss': 0.6775, 'learning_rate': 1.6523076923076925e-05, 'epoch': 2.01}


 67%|██████▋   | 81630/121875 [25:45:43<12:03:41,  1.08s/it]

{'loss': 0.5829, 'learning_rate': 1.6510769230769234e-05, 'epoch': 2.01}


 67%|██████▋   | 81660/121875 [25:46:16<12:01:31,  1.08s/it]

{'loss': 0.6571, 'learning_rate': 1.649846153846154e-05, 'epoch': 2.01}


 67%|██████▋   | 81690/121875 [25:46:48<12:03:36,  1.08s/it]

{'loss': 0.5868, 'learning_rate': 1.6486153846153845e-05, 'epoch': 2.01}


 67%|██████▋   | 81720/121875 [25:47:20<12:01:44,  1.08s/it]

{'loss': 0.6138, 'learning_rate': 1.6473846153846154e-05, 'epoch': 2.01}


 67%|██████▋   | 81750/121875 [25:47:53<12:02:32,  1.08s/it]

{'loss': 0.5987, 'learning_rate': 1.646153846153846e-05, 'epoch': 2.01}


 67%|██████▋   | 81780/121875 [25:48:25<11:59:48,  1.08s/it]

{'loss': 0.6587, 'learning_rate': 1.644923076923077e-05, 'epoch': 2.01}


 67%|██████▋   | 81810/121875 [25:48:57<12:01:06,  1.08s/it]

{'loss': 0.6371, 'learning_rate': 1.6436923076923078e-05, 'epoch': 2.01}


 67%|██████▋   | 81840/121875 [25:49:30<11:56:29,  1.07s/it]

{'loss': 0.5939, 'learning_rate': 1.6424615384615384e-05, 'epoch': 2.01}


 67%|██████▋   | 81870/121875 [25:50:02<12:00:31,  1.08s/it]

{'loss': 0.6588, 'learning_rate': 1.6412307692307693e-05, 'epoch': 2.02}


 67%|██████▋   | 81900/121875 [25:50:35<12:00:41,  1.08s/it]

{'loss': 0.6176, 'learning_rate': 1.6400000000000002e-05, 'epoch': 2.02}


 67%|██████▋   | 81930/121875 [25:51:07<12:00:29,  1.08s/it]

{'loss': 0.6388, 'learning_rate': 1.6387692307692308e-05, 'epoch': 2.02}


 67%|██████▋   | 81960/121875 [25:51:39<11:55:51,  1.08s/it]

{'loss': 0.6384, 'learning_rate': 1.6375384615384617e-05, 'epoch': 2.02}


 67%|██████▋   | 81990/121875 [25:52:12<11:55:05,  1.08s/it]

{'loss': 0.6591, 'learning_rate': 1.6363076923076923e-05, 'epoch': 2.02}


 67%|██████▋   | 82020/121875 [25:52:47<11:56:29,  1.08s/it]

{'loss': 0.592, 'learning_rate': 1.6350769230769232e-05, 'epoch': 2.02}


 67%|██████▋   | 82050/121875 [25:53:19<11:57:39,  1.08s/it]

{'loss': 0.6259, 'learning_rate': 1.6338461538461538e-05, 'epoch': 2.02}


 67%|██████▋   | 82080/121875 [25:53:51<11:54:53,  1.08s/it]

{'loss': 0.6755, 'learning_rate': 1.6326153846153847e-05, 'epoch': 2.02}


 67%|██████▋   | 82110/121875 [25:54:24<11:56:43,  1.08s/it]

{'loss': 0.6443, 'learning_rate': 1.6313846153846156e-05, 'epoch': 2.02}


 67%|██████▋   | 82140/121875 [25:54:56<11:55:18,  1.08s/it]

{'loss': 0.6283, 'learning_rate': 1.630153846153846e-05, 'epoch': 2.02}


 67%|██████▋   | 82170/121875 [25:55:28<11:55:06,  1.08s/it]

{'loss': 0.6598, 'learning_rate': 1.628923076923077e-05, 'epoch': 2.02}


 67%|██████▋   | 82200/121875 [25:56:01<11:56:04,  1.08s/it]

{'loss': 0.6308, 'learning_rate': 1.627692307692308e-05, 'epoch': 2.02}


 67%|██████▋   | 82230/121875 [25:56:33<11:52:40,  1.08s/it]

{'loss': 0.6556, 'learning_rate': 1.6264615384615386e-05, 'epoch': 2.02}


 67%|██████▋   | 82260/121875 [25:57:06<11:49:19,  1.07s/it]

{'loss': 0.6448, 'learning_rate': 1.6252307692307695e-05, 'epoch': 2.02}


 68%|██████▊   | 82290/121875 [25:57:38<11:48:43,  1.07s/it]

{'loss': 0.5578, 'learning_rate': 1.624e-05, 'epoch': 2.03}


 68%|██████▊   | 82320/121875 [25:58:10<11:50:03,  1.08s/it]

{'loss': 0.5781, 'learning_rate': 1.6227692307692306e-05, 'epoch': 2.03}


 68%|██████▊   | 82350/121875 [25:58:43<11:50:12,  1.08s/it]

{'loss': 0.5923, 'learning_rate': 1.6215384615384615e-05, 'epoch': 2.03}


 68%|██████▊   | 82380/121875 [25:59:15<11:47:28,  1.07s/it]

{'loss': 0.5733, 'learning_rate': 1.6203076923076924e-05, 'epoch': 2.03}


 68%|██████▊   | 82410/121875 [25:59:47<11:49:33,  1.08s/it]

{'loss': 0.6171, 'learning_rate': 1.619076923076923e-05, 'epoch': 2.03}


 68%|██████▊   | 82440/121875 [26:00:20<11:55:52,  1.09s/it]

{'loss': 0.6703, 'learning_rate': 1.617846153846154e-05, 'epoch': 2.03}


 68%|██████▊   | 82470/121875 [26:00:52<11:48:36,  1.08s/it]

{'loss': 0.621, 'learning_rate': 1.6166153846153848e-05, 'epoch': 2.03}


 68%|██████▊   | 82500/121875 [26:01:25<11:48:08,  1.08s/it]

{'loss': 0.6366, 'learning_rate': 1.6153846153846154e-05, 'epoch': 2.03}


 68%|██████▊   | 82530/121875 [26:01:59<11:46:47,  1.08s/it]

{'loss': 0.5981, 'learning_rate': 1.6141538461538463e-05, 'epoch': 2.03}


 68%|██████▊   | 82560/121875 [26:02:32<11:46:20,  1.08s/it]

{'loss': 0.5976, 'learning_rate': 1.612923076923077e-05, 'epoch': 2.03}


 68%|██████▊   | 82590/121875 [26:03:04<11:45:50,  1.08s/it]

{'loss': 0.6418, 'learning_rate': 1.6116923076923078e-05, 'epoch': 2.03}


 68%|██████▊   | 82620/121875 [26:03:36<11:49:05,  1.08s/it]

{'loss': 0.5891, 'learning_rate': 1.6104615384615384e-05, 'epoch': 2.03}


 68%|██████▊   | 82650/121875 [26:04:09<11:49:50,  1.09s/it]

{'loss': 0.6254, 'learning_rate': 1.6092307692307693e-05, 'epoch': 2.03}


 68%|██████▊   | 82680/121875 [26:04:41<11:45:30,  1.08s/it]

{'loss': 0.624, 'learning_rate': 1.6080000000000002e-05, 'epoch': 2.04}


 68%|██████▊   | 82710/121875 [26:05:14<11:50:05,  1.09s/it]

{'loss': 0.638, 'learning_rate': 1.6067692307692308e-05, 'epoch': 2.04}


 68%|██████▊   | 82740/121875 [26:05:46<11:45:38,  1.08s/it]

{'loss': 0.6029, 'learning_rate': 1.6055384615384617e-05, 'epoch': 2.04}


 68%|██████▊   | 82770/121875 [26:06:18<11:46:35,  1.08s/it]

{'loss': 0.6505, 'learning_rate': 1.6043076923076926e-05, 'epoch': 2.04}


 68%|██████▊   | 82800/121875 [26:06:51<11:45:07,  1.08s/it]

{'loss': 0.6569, 'learning_rate': 1.603076923076923e-05, 'epoch': 2.04}


 68%|██████▊   | 82830/121875 [26:07:23<11:41:08,  1.08s/it]

{'loss': 0.5937, 'learning_rate': 1.601846153846154e-05, 'epoch': 2.04}


 68%|██████▊   | 82860/121875 [26:07:56<11:41:53,  1.08s/it]

{'loss': 0.5996, 'learning_rate': 1.6006153846153846e-05, 'epoch': 2.04}


 68%|██████▊   | 82890/121875 [26:08:28<11:39:06,  1.08s/it]

{'loss': 0.6131, 'learning_rate': 1.5993846153846152e-05, 'epoch': 2.04}


 68%|██████▊   | 82920/121875 [26:09:00<11:41:51,  1.08s/it]

{'loss': 0.5708, 'learning_rate': 1.598153846153846e-05, 'epoch': 2.04}


 68%|██████▊   | 82950/121875 [26:09:33<11:41:59,  1.08s/it]

{'loss': 0.6588, 'learning_rate': 1.596923076923077e-05, 'epoch': 2.04}


 68%|██████▊   | 82980/121875 [26:10:05<11:38:49,  1.08s/it]

{'loss': 0.6542, 'learning_rate': 1.5956923076923076e-05, 'epoch': 2.04}


 68%|██████▊   | 83010/121875 [26:10:40<11:56:12,  1.11s/it]

{'loss': 0.5967, 'learning_rate': 1.5944615384615385e-05, 'epoch': 2.04}


 68%|██████▊   | 83040/121875 [26:11:12<11:38:34,  1.08s/it]

{'loss': 0.6082, 'learning_rate': 1.5932307692307694e-05, 'epoch': 2.04}


 68%|██████▊   | 83070/121875 [26:11:45<11:41:22,  1.08s/it]

{'loss': 0.6178, 'learning_rate': 1.592e-05, 'epoch': 2.04}


 68%|██████▊   | 83100/121875 [26:12:17<11:44:14,  1.09s/it]

{'loss': 0.5811, 'learning_rate': 1.590769230769231e-05, 'epoch': 2.05}


 68%|██████▊   | 83130/121875 [26:12:50<11:40:30,  1.08s/it]

{'loss': 0.5691, 'learning_rate': 1.5895384615384615e-05, 'epoch': 2.05}


 68%|██████▊   | 83160/121875 [26:13:22<11:36:23,  1.08s/it]

{'loss': 0.6462, 'learning_rate': 1.5883076923076924e-05, 'epoch': 2.05}


 68%|██████▊   | 83190/121875 [26:13:54<11:36:34,  1.08s/it]

{'loss': 0.6513, 'learning_rate': 1.587076923076923e-05, 'epoch': 2.05}


 68%|██████▊   | 83220/121875 [26:14:27<11:35:53,  1.08s/it]

{'loss': 0.6058, 'learning_rate': 1.585846153846154e-05, 'epoch': 2.05}


 68%|██████▊   | 83250/121875 [26:14:59<11:32:48,  1.08s/it]

{'loss': 0.5953, 'learning_rate': 1.5846153846153848e-05, 'epoch': 2.05}


 68%|██████▊   | 83280/121875 [26:15:31<11:36:22,  1.08s/it]

{'loss': 0.6901, 'learning_rate': 1.5833846153846154e-05, 'epoch': 2.05}


 68%|██████▊   | 83310/121875 [26:16:04<11:32:37,  1.08s/it]

{'loss': 0.6243, 'learning_rate': 1.5821538461538463e-05, 'epoch': 2.05}


 68%|██████▊   | 83340/121875 [26:16:36<11:33:02,  1.08s/it]

{'loss': 0.6498, 'learning_rate': 1.5809230769230772e-05, 'epoch': 2.05}


 68%|██████▊   | 83370/121875 [26:17:09<11:33:31,  1.08s/it]

{'loss': 0.6076, 'learning_rate': 1.5796923076923078e-05, 'epoch': 2.05}


 68%|██████▊   | 83400/121875 [26:17:41<11:30:22,  1.08s/it]

{'loss': 0.6337, 'learning_rate': 1.5784615384615387e-05, 'epoch': 2.05}


 68%|██████▊   | 83430/121875 [26:18:13<11:32:28,  1.08s/it]

{'loss': 0.5911, 'learning_rate': 1.5772307692307692e-05, 'epoch': 2.05}


 68%|██████▊   | 83460/121875 [26:18:46<11:30:44,  1.08s/it]

{'loss': 0.5431, 'learning_rate': 1.5759999999999998e-05, 'epoch': 2.05}


 69%|██████▊   | 83490/121875 [26:19:18<11:26:44,  1.07s/it]

{'loss': 0.5938, 'learning_rate': 1.5747692307692307e-05, 'epoch': 2.06}


 69%|██████▊   | 83520/121875 [26:19:53<11:31:13,  1.08s/it]

{'loss': 0.6367, 'learning_rate': 1.5735384615384616e-05, 'epoch': 2.06}


 69%|██████▊   | 83550/121875 [26:20:25<11:29:08,  1.08s/it]

{'loss': 0.6521, 'learning_rate': 1.5723076923076922e-05, 'epoch': 2.06}


 69%|██████▊   | 83580/121875 [26:20:57<11:27:04,  1.08s/it]

{'loss': 0.6047, 'learning_rate': 1.571076923076923e-05, 'epoch': 2.06}


 69%|██████▊   | 83610/121875 [26:21:30<11:29:22,  1.08s/it]

{'loss': 0.6619, 'learning_rate': 1.569846153846154e-05, 'epoch': 2.06}


 69%|██████▊   | 83640/121875 [26:22:02<11:22:43,  1.07s/it]

{'loss': 0.5598, 'learning_rate': 1.5686153846153846e-05, 'epoch': 2.06}


 69%|██████▊   | 83670/121875 [26:22:34<11:25:17,  1.08s/it]

{'loss': 0.5569, 'learning_rate': 1.5673846153846155e-05, 'epoch': 2.06}


 69%|██████▊   | 83700/121875 [26:23:07<11:23:41,  1.07s/it]

{'loss': 0.6709, 'learning_rate': 1.5661538461538464e-05, 'epoch': 2.06}


 69%|██████▊   | 83730/121875 [26:23:39<11:26:29,  1.08s/it]

{'loss': 0.6262, 'learning_rate': 1.564923076923077e-05, 'epoch': 2.06}


 69%|██████▊   | 83760/121875 [26:24:12<11:25:32,  1.08s/it]

{'loss': 0.6637, 'learning_rate': 1.5636923076923076e-05, 'epoch': 2.06}


 69%|██████▉   | 83790/121875 [26:24:44<11:22:33,  1.08s/it]

{'loss': 0.5845, 'learning_rate': 1.5624615384615385e-05, 'epoch': 2.06}


 69%|██████▉   | 83820/121875 [26:25:16<11:24:56,  1.08s/it]

{'loss': 0.6391, 'learning_rate': 1.5612307692307694e-05, 'epoch': 2.06}


 69%|██████▉   | 83850/121875 [26:25:49<11:22:15,  1.08s/it]

{'loss': 0.589, 'learning_rate': 1.56e-05, 'epoch': 2.06}


 69%|██████▉   | 83880/121875 [26:26:21<11:24:04,  1.08s/it]

{'loss': 0.5907, 'learning_rate': 1.558769230769231e-05, 'epoch': 2.06}


 69%|██████▉   | 83910/121875 [26:26:53<11:27:25,  1.09s/it]

{'loss': 0.5795, 'learning_rate': 1.5575384615384618e-05, 'epoch': 2.07}


 69%|██████▉   | 83940/121875 [26:27:26<11:20:11,  1.08s/it]

{'loss': 0.6667, 'learning_rate': 1.5563076923076924e-05, 'epoch': 2.07}


 69%|██████▉   | 83970/121875 [26:27:58<11:21:34,  1.08s/it]

{'loss': 0.6344, 'learning_rate': 1.5550769230769233e-05, 'epoch': 2.07}


 69%|██████▉   | 84000/121875 [26:28:31<11:21:37,  1.08s/it]

{'loss': 0.6113, 'learning_rate': 1.553846153846154e-05, 'epoch': 2.07}


 69%|██████▉   | 84030/121875 [26:29:05<11:19:16,  1.08s/it]

{'loss': 0.6183, 'learning_rate': 1.5526153846153844e-05, 'epoch': 2.07}


 69%|██████▉   | 84060/121875 [26:29:38<11:20:23,  1.08s/it]

{'loss': 0.6489, 'learning_rate': 1.5513846153846153e-05, 'epoch': 2.07}


 69%|██████▉   | 84090/121875 [26:30:10<11:17:07,  1.08s/it]

{'loss': 0.6023, 'learning_rate': 1.5501538461538462e-05, 'epoch': 2.07}


 69%|██████▉   | 84120/121875 [26:30:42<11:19:06,  1.08s/it]

{'loss': 0.6418, 'learning_rate': 1.5489230769230768e-05, 'epoch': 2.07}


 69%|██████▉   | 84150/121875 [26:31:15<11:16:35,  1.08s/it]

{'loss': 0.6322, 'learning_rate': 1.5476923076923077e-05, 'epoch': 2.07}


 69%|██████▉   | 84180/121875 [26:31:47<11:18:43,  1.08s/it]

{'loss': 0.6156, 'learning_rate': 1.5464615384615386e-05, 'epoch': 2.07}


 69%|██████▉   | 84210/121875 [26:32:20<11:17:25,  1.08s/it]

{'loss': 0.6235, 'learning_rate': 1.5452307692307692e-05, 'epoch': 2.07}


 69%|██████▉   | 84240/121875 [26:32:52<11:15:27,  1.08s/it]

{'loss': 0.6196, 'learning_rate': 1.544e-05, 'epoch': 2.07}


 69%|██████▉   | 84270/121875 [26:33:24<11:13:57,  1.08s/it]

{'loss': 0.6121, 'learning_rate': 1.542769230769231e-05, 'epoch': 2.07}


 69%|██████▉   | 84300/121875 [26:33:57<11:20:49,  1.09s/it]

{'loss': 0.6173, 'learning_rate': 1.5415384615384616e-05, 'epoch': 2.08}


 69%|██████▉   | 84330/121875 [26:34:29<11:14:54,  1.08s/it]

{'loss': 0.5888, 'learning_rate': 1.5403076923076922e-05, 'epoch': 2.08}


 69%|██████▉   | 84360/121875 [26:35:01<11:16:23,  1.08s/it]

{'loss': 0.5839, 'learning_rate': 1.539076923076923e-05, 'epoch': 2.08}


 69%|██████▉   | 84390/121875 [26:35:34<11:12:54,  1.08s/it]

{'loss': 0.6712, 'learning_rate': 1.537846153846154e-05, 'epoch': 2.08}


 69%|██████▉   | 84420/121875 [26:36:06<11:11:09,  1.08s/it]

{'loss': 0.6773, 'learning_rate': 1.5366153846153846e-05, 'epoch': 2.08}


 69%|██████▉   | 84450/121875 [26:36:38<11:12:48,  1.08s/it]

{'loss': 0.5672, 'learning_rate': 1.5353846153846155e-05, 'epoch': 2.08}


 69%|██████▉   | 84480/121875 [26:37:11<11:12:05,  1.08s/it]

{'loss': 0.6742, 'learning_rate': 1.5341538461538464e-05, 'epoch': 2.08}


 69%|██████▉   | 84510/121875 [26:37:45<11:29:43,  1.11s/it]

{'loss': 0.6266, 'learning_rate': 1.532923076923077e-05, 'epoch': 2.08}


 69%|██████▉   | 84540/121875 [26:38:18<11:13:44,  1.08s/it]

{'loss': 0.589, 'learning_rate': 1.531692307692308e-05, 'epoch': 2.08}


 69%|██████▉   | 84570/121875 [26:38:50<11:09:01,  1.08s/it]

{'loss': 0.585, 'learning_rate': 1.5304615384615388e-05, 'epoch': 2.08}


 69%|██████▉   | 84600/121875 [26:39:22<11:08:33,  1.08s/it]

{'loss': 0.6188, 'learning_rate': 1.529230769230769e-05, 'epoch': 2.08}


 69%|██████▉   | 84630/121875 [26:39:55<11:09:03,  1.08s/it]

{'loss': 0.6297, 'learning_rate': 1.528e-05, 'epoch': 2.08}


 69%|██████▉   | 84660/121875 [26:40:27<11:10:13,  1.08s/it]

{'loss': 0.6321, 'learning_rate': 1.526769230769231e-05, 'epoch': 2.08}


 69%|██████▉   | 84690/121875 [26:40:59<11:10:39,  1.08s/it]

{'loss': 0.5672, 'learning_rate': 1.5255384615384616e-05, 'epoch': 2.08}


 70%|██████▉   | 84720/121875 [26:41:32<11:08:16,  1.08s/it]

{'loss': 0.6041, 'learning_rate': 1.5243076923076923e-05, 'epoch': 2.09}


 70%|██████▉   | 84750/121875 [26:42:04<11:04:15,  1.07s/it]

{'loss': 0.6048, 'learning_rate': 1.5230769230769232e-05, 'epoch': 2.09}


 70%|██████▉   | 84780/121875 [26:42:37<11:05:10,  1.08s/it]

{'loss': 0.5845, 'learning_rate': 1.521846153846154e-05, 'epoch': 2.09}


 70%|██████▉   | 84810/121875 [26:43:09<11:03:06,  1.07s/it]

{'loss': 0.6267, 'learning_rate': 1.5206153846153847e-05, 'epoch': 2.09}


 70%|██████▉   | 84840/121875 [26:43:41<11:05:59,  1.08s/it]

{'loss': 0.5977, 'learning_rate': 1.5193846153846156e-05, 'epoch': 2.09}


 70%|██████▉   | 84870/121875 [26:44:14<11:02:27,  1.07s/it]

{'loss': 0.5819, 'learning_rate': 1.518153846153846e-05, 'epoch': 2.09}


 70%|██████▉   | 84900/121875 [26:44:46<11:00:50,  1.07s/it]

{'loss': 0.5977, 'learning_rate': 1.516923076923077e-05, 'epoch': 2.09}


 70%|██████▉   | 84930/121875 [26:45:18<11:03:09,  1.08s/it]

{'loss': 0.5983, 'learning_rate': 1.5156923076923077e-05, 'epoch': 2.09}


 70%|██████▉   | 84960/121875 [26:45:51<11:05:25,  1.08s/it]

{'loss': 0.6147, 'learning_rate': 1.5144615384615384e-05, 'epoch': 2.09}


 70%|██████▉   | 84990/121875 [26:46:23<10:59:57,  1.07s/it]

{'loss': 0.5873, 'learning_rate': 1.5132307692307694e-05, 'epoch': 2.09}


 70%|██████▉   | 85020/121875 [26:46:58<11:03:31,  1.08s/it]

{'loss': 0.6597, 'learning_rate': 1.5120000000000001e-05, 'epoch': 2.09}


 70%|██████▉   | 85050/121875 [26:47:30<11:02:07,  1.08s/it]

{'loss': 0.579, 'learning_rate': 1.5107692307692308e-05, 'epoch': 2.09}


 70%|██████▉   | 85080/121875 [26:48:02<11:01:25,  1.08s/it]

{'loss': 0.6121, 'learning_rate': 1.5095384615384617e-05, 'epoch': 2.09}


 70%|██████▉   | 85110/121875 [26:48:35<11:03:19,  1.08s/it]

{'loss': 0.6669, 'learning_rate': 1.5083076923076925e-05, 'epoch': 2.1}


 70%|██████▉   | 85140/121875 [26:49:07<11:00:53,  1.08s/it]

{'loss': 0.6197, 'learning_rate': 1.5070769230769232e-05, 'epoch': 2.1}


 70%|██████▉   | 85170/121875 [26:49:39<11:00:23,  1.08s/it]

{'loss': 0.5907, 'learning_rate': 1.5058461538461538e-05, 'epoch': 2.1}


 70%|██████▉   | 85200/121875 [26:50:12<10:58:19,  1.08s/it]

{'loss': 0.5797, 'learning_rate': 1.5046153846153845e-05, 'epoch': 2.1}


 70%|██████▉   | 85230/121875 [26:50:44<10:57:53,  1.08s/it]

{'loss': 0.5723, 'learning_rate': 1.5033846153846155e-05, 'epoch': 2.1}


 70%|██████▉   | 85260/121875 [26:51:16<10:56:57,  1.08s/it]

{'loss': 0.6377, 'learning_rate': 1.5021538461538462e-05, 'epoch': 2.1}


 70%|██████▉   | 85290/121875 [26:51:49<10:59:08,  1.08s/it]

{'loss': 0.6485, 'learning_rate': 1.500923076923077e-05, 'epoch': 2.1}


 70%|███████   | 85320/121875 [26:52:21<10:56:13,  1.08s/it]

{'loss': 0.6249, 'learning_rate': 1.4996923076923079e-05, 'epoch': 2.1}


 70%|███████   | 85350/121875 [26:52:54<10:55:51,  1.08s/it]

{'loss': 0.6267, 'learning_rate': 1.4984615384615386e-05, 'epoch': 2.1}


 70%|███████   | 85380/121875 [26:53:26<10:54:29,  1.08s/it]

{'loss': 0.5857, 'learning_rate': 1.4972307692307693e-05, 'epoch': 2.1}


 70%|███████   | 85410/121875 [26:53:58<10:56:16,  1.08s/it]

{'loss': 0.6147, 'learning_rate': 1.4960000000000002e-05, 'epoch': 2.1}


 70%|███████   | 85440/121875 [26:54:31<10:57:04,  1.08s/it]

{'loss': 0.6489, 'learning_rate': 1.494769230769231e-05, 'epoch': 2.1}


 70%|███████   | 85470/121875 [26:55:03<10:53:22,  1.08s/it]

{'loss': 0.6305, 'learning_rate': 1.4935384615384616e-05, 'epoch': 2.1}


 70%|███████   | 85500/121875 [26:55:35<10:54:27,  1.08s/it]

{'loss': 0.6352, 'learning_rate': 1.4923076923076923e-05, 'epoch': 2.1}


 70%|███████   | 85530/121875 [26:56:10<10:51:28,  1.08s/it]

{'loss': 0.6071, 'learning_rate': 1.491076923076923e-05, 'epoch': 2.11}


 70%|███████   | 85560/121875 [26:56:42<10:52:52,  1.08s/it]

{'loss': 0.5666, 'learning_rate': 1.489846153846154e-05, 'epoch': 2.11}


 70%|███████   | 85590/121875 [26:57:15<10:50:23,  1.08s/it]

{'loss': 0.6281, 'learning_rate': 1.4886153846153847e-05, 'epoch': 2.11}


 70%|███████   | 85620/121875 [26:57:47<10:54:19,  1.08s/it]

{'loss': 0.6268, 'learning_rate': 1.4873846153846154e-05, 'epoch': 2.11}


 70%|███████   | 85650/121875 [26:58:19<10:59:55,  1.09s/it]

{'loss': 0.6065, 'learning_rate': 1.4861538461538464e-05, 'epoch': 2.11}


 70%|███████   | 85680/121875 [26:58:52<10:48:17,  1.07s/it]

{'loss': 0.6358, 'learning_rate': 1.4849230769230771e-05, 'epoch': 2.11}


 70%|███████   | 85710/121875 [26:59:24<10:47:06,  1.07s/it]

{'loss': 0.6654, 'learning_rate': 1.4836923076923078e-05, 'epoch': 2.11}


 70%|███████   | 85740/121875 [26:59:56<10:50:00,  1.08s/it]

{'loss': 0.5463, 'learning_rate': 1.4824615384615384e-05, 'epoch': 2.11}


 70%|███████   | 85770/121875 [27:00:29<10:50:23,  1.08s/it]

{'loss': 0.6056, 'learning_rate': 1.4812307692307691e-05, 'epoch': 2.11}


 70%|███████   | 85800/121875 [27:01:01<10:44:52,  1.07s/it]

{'loss': 0.595, 'learning_rate': 1.48e-05, 'epoch': 2.11}


 70%|███████   | 85830/121875 [27:01:34<10:56:40,  1.09s/it]

{'loss': 0.6514, 'learning_rate': 1.4787692307692308e-05, 'epoch': 2.11}


 70%|███████   | 85860/121875 [27:02:06<10:49:53,  1.08s/it]

{'loss': 0.5546, 'learning_rate': 1.4775384615384615e-05, 'epoch': 2.11}


 70%|███████   | 85890/121875 [27:02:39<10:50:17,  1.08s/it]

{'loss': 0.627, 'learning_rate': 1.4763076923076925e-05, 'epoch': 2.11}


 70%|███████   | 85920/121875 [27:03:11<10:45:22,  1.08s/it]

{'loss': 0.6758, 'learning_rate': 1.4750769230769232e-05, 'epoch': 2.11}


 71%|███████   | 85950/121875 [27:03:43<10:47:41,  1.08s/it]

{'loss': 0.6527, 'learning_rate': 1.473846153846154e-05, 'epoch': 2.12}


 71%|███████   | 85980/121875 [27:04:16<10:42:59,  1.07s/it]

{'loss': 0.5689, 'learning_rate': 1.4726153846153849e-05, 'epoch': 2.12}


 71%|███████   | 86010/121875 [27:04:51<11:04:12,  1.11s/it]

{'loss': 0.5967, 'learning_rate': 1.4713846153846156e-05, 'epoch': 2.12}


 71%|███████   | 86040/121875 [27:05:23<10:42:43,  1.08s/it]

{'loss': 0.6311, 'learning_rate': 1.4701538461538462e-05, 'epoch': 2.12}


 71%|███████   | 86070/121875 [27:05:55<10:45:21,  1.08s/it]

{'loss': 0.5692, 'learning_rate': 1.4689230769230769e-05, 'epoch': 2.12}


 71%|███████   | 86100/121875 [27:06:28<10:44:40,  1.08s/it]

{'loss': 0.5617, 'learning_rate': 1.4676923076923076e-05, 'epoch': 2.12}


 71%|███████   | 86130/121875 [27:07:00<10:42:15,  1.08s/it]

{'loss': 0.6708, 'learning_rate': 1.4664615384615386e-05, 'epoch': 2.12}


 71%|███████   | 86160/121875 [27:07:32<10:41:37,  1.08s/it]

{'loss': 0.5913, 'learning_rate': 1.4652307692307693e-05, 'epoch': 2.12}


 71%|███████   | 86190/121875 [27:08:05<10:39:55,  1.08s/it]

{'loss': 0.5924, 'learning_rate': 1.464e-05, 'epoch': 2.12}


 71%|███████   | 86220/121875 [27:08:37<10:41:22,  1.08s/it]

{'loss': 0.6135, 'learning_rate': 1.462769230769231e-05, 'epoch': 2.12}


 71%|███████   | 86250/121875 [27:09:10<10:41:57,  1.08s/it]

{'loss': 0.5426, 'learning_rate': 1.4615384615384617e-05, 'epoch': 2.12}


 71%|███████   | 86280/121875 [27:09:42<10:39:53,  1.08s/it]

{'loss': 0.6221, 'learning_rate': 1.4603076923076924e-05, 'epoch': 2.12}


 71%|███████   | 86310/121875 [27:10:14<10:37:09,  1.07s/it]

{'loss': 0.6101, 'learning_rate': 1.4590769230769234e-05, 'epoch': 2.12}


 71%|███████   | 86340/121875 [27:10:47<10:39:24,  1.08s/it]

{'loss': 0.6191, 'learning_rate': 1.4578461538461538e-05, 'epoch': 2.13}


 71%|███████   | 86370/121875 [27:11:19<10:38:15,  1.08s/it]

{'loss': 0.5718, 'learning_rate': 1.4566153846153847e-05, 'epoch': 2.13}


 71%|███████   | 86400/121875 [27:11:51<10:37:18,  1.08s/it]

{'loss': 0.5875, 'learning_rate': 1.4553846153846154e-05, 'epoch': 2.13}


 71%|███████   | 86430/121875 [27:12:24<10:37:26,  1.08s/it]

{'loss': 0.6773, 'learning_rate': 1.4541538461538461e-05, 'epoch': 2.13}


 71%|███████   | 86460/121875 [27:12:57<10:40:54,  1.09s/it]

{'loss': 0.6694, 'learning_rate': 1.452923076923077e-05, 'epoch': 2.13}


 71%|███████   | 86490/121875 [27:13:29<10:34:19,  1.08s/it]

{'loss': 0.572, 'learning_rate': 1.4516923076923078e-05, 'epoch': 2.13}


 71%|███████   | 86520/121875 [27:14:04<10:41:00,  1.09s/it]

{'loss': 0.5838, 'learning_rate': 1.4504615384615385e-05, 'epoch': 2.13}


 71%|███████   | 86550/121875 [27:14:37<10:33:18,  1.08s/it]

{'loss': 0.6042, 'learning_rate': 1.4492307692307695e-05, 'epoch': 2.13}


 71%|███████   | 86580/121875 [27:15:09<10:35:30,  1.08s/it]

{'loss': 0.5801, 'learning_rate': 1.4480000000000002e-05, 'epoch': 2.13}


 71%|███████   | 86610/121875 [27:15:41<10:35:27,  1.08s/it]

{'loss': 0.6, 'learning_rate': 1.4467692307692308e-05, 'epoch': 2.13}


 71%|███████   | 86640/121875 [27:16:14<10:32:51,  1.08s/it]

{'loss': 0.5771, 'learning_rate': 1.4455384615384615e-05, 'epoch': 2.13}


 71%|███████   | 86670/121875 [27:16:46<10:38:31,  1.09s/it]

{'loss': 0.6245, 'learning_rate': 1.4443076923076923e-05, 'epoch': 2.13}


 71%|███████   | 86700/121875 [27:17:19<10:33:38,  1.08s/it]

{'loss': 0.5861, 'learning_rate': 1.4430769230769232e-05, 'epoch': 2.13}


 71%|███████   | 86730/121875 [27:17:51<10:32:57,  1.08s/it]

{'loss': 0.6394, 'learning_rate': 1.4418461538461539e-05, 'epoch': 2.13}


 71%|███████   | 86760/121875 [27:18:24<10:30:23,  1.08s/it]

{'loss': 0.6559, 'learning_rate': 1.4406153846153846e-05, 'epoch': 2.14}


 71%|███████   | 86790/121875 [27:18:56<10:27:39,  1.07s/it]

{'loss': 0.6268, 'learning_rate': 1.4393846153846156e-05, 'epoch': 2.14}


 71%|███████   | 86820/121875 [27:19:28<10:30:06,  1.08s/it]

{'loss': 0.603, 'learning_rate': 1.4381538461538463e-05, 'epoch': 2.14}


 71%|███████▏  | 86850/121875 [27:20:01<10:38:25,  1.09s/it]

{'loss': 0.6433, 'learning_rate': 1.436923076923077e-05, 'epoch': 2.14}


 71%|███████▏  | 86880/121875 [27:20:33<10:27:23,  1.08s/it]

{'loss': 0.6468, 'learning_rate': 1.435692307692308e-05, 'epoch': 2.14}


 71%|███████▏  | 86910/121875 [27:21:06<10:29:06,  1.08s/it]

{'loss': 0.6045, 'learning_rate': 1.4344615384615384e-05, 'epoch': 2.14}


 71%|███████▏  | 86940/121875 [27:21:38<10:25:56,  1.08s/it]

{'loss': 0.616, 'learning_rate': 1.4332307692307693e-05, 'epoch': 2.14}


 71%|███████▏  | 86970/121875 [27:22:10<10:28:26,  1.08s/it]

{'loss': 0.6764, 'learning_rate': 1.432e-05, 'epoch': 2.14}


 71%|███████▏  | 87000/121875 [27:22:43<10:25:37,  1.08s/it]

{'loss': 0.5817, 'learning_rate': 1.4307692307692308e-05, 'epoch': 2.14}


 71%|███████▏  | 87030/121875 [27:23:18<10:27:11,  1.08s/it]

{'loss': 0.6559, 'learning_rate': 1.4295384615384617e-05, 'epoch': 2.14}


 71%|███████▏  | 87060/121875 [27:23:50<10:28:58,  1.08s/it]

{'loss': 0.6055, 'learning_rate': 1.4283076923076924e-05, 'epoch': 2.14}


 71%|███████▏  | 87090/121875 [27:24:22<10:24:38,  1.08s/it]

{'loss': 0.6226, 'learning_rate': 1.4270769230769231e-05, 'epoch': 2.14}


 71%|███████▏  | 87120/121875 [27:24:55<10:25:01,  1.08s/it]

{'loss': 0.6148, 'learning_rate': 1.425846153846154e-05, 'epoch': 2.14}


 72%|███████▏  | 87150/121875 [27:25:27<10:24:00,  1.08s/it]

{'loss': 0.6324, 'learning_rate': 1.4246153846153848e-05, 'epoch': 2.15}


 72%|███████▏  | 87180/121875 [27:26:00<10:24:09,  1.08s/it]

{'loss': 0.6271, 'learning_rate': 1.4233846153846154e-05, 'epoch': 2.15}


 72%|███████▏  | 87210/121875 [27:26:32<10:26:12,  1.08s/it]

{'loss': 0.6585, 'learning_rate': 1.4221538461538461e-05, 'epoch': 2.15}


 72%|███████▏  | 87240/121875 [27:27:05<10:28:19,  1.09s/it]

{'loss': 0.5759, 'learning_rate': 1.4209230769230769e-05, 'epoch': 2.15}


 72%|███████▏  | 87270/121875 [27:27:37<10:27:44,  1.09s/it]

{'loss': 0.5454, 'learning_rate': 1.4196923076923078e-05, 'epoch': 2.15}


 72%|███████▏  | 87300/121875 [27:28:10<10:24:47,  1.08s/it]

{'loss': 0.5898, 'learning_rate': 1.4184615384615385e-05, 'epoch': 2.15}


 72%|███████▏  | 87330/121875 [27:28:42<10:22:00,  1.08s/it]

{'loss': 0.5836, 'learning_rate': 1.4172307692307693e-05, 'epoch': 2.15}


 72%|███████▏  | 87360/121875 [27:29:15<10:20:55,  1.08s/it]

{'loss': 0.66, 'learning_rate': 1.4160000000000002e-05, 'epoch': 2.15}


 72%|███████▏  | 87390/121875 [27:29:47<10:22:26,  1.08s/it]

{'loss': 0.6113, 'learning_rate': 1.4147692307692309e-05, 'epoch': 2.15}


 72%|███████▏  | 87420/121875 [27:30:19<10:20:11,  1.08s/it]

{'loss': 0.5503, 'learning_rate': 1.4135384615384616e-05, 'epoch': 2.15}


 72%|███████▏  | 87450/121875 [27:30:52<10:20:01,  1.08s/it]

{'loss': 0.6443, 'learning_rate': 1.4123076923076926e-05, 'epoch': 2.15}


 72%|███████▏  | 87480/121875 [27:31:25<10:21:52,  1.08s/it]

{'loss': 0.5902, 'learning_rate': 1.411076923076923e-05, 'epoch': 2.15}


 72%|███████▏  | 87510/121875 [27:31:59<10:34:12,  1.11s/it]

{'loss': 0.5968, 'learning_rate': 1.4098461538461539e-05, 'epoch': 2.15}


 72%|███████▏  | 87540/121875 [27:32:32<10:17:29,  1.08s/it]

{'loss': 0.6369, 'learning_rate': 1.4086153846153846e-05, 'epoch': 2.15}


 72%|███████▏  | 87570/121875 [27:33:04<10:15:51,  1.08s/it]

{'loss': 0.6315, 'learning_rate': 1.4073846153846154e-05, 'epoch': 2.16}


 72%|███████▏  | 87600/121875 [27:33:37<10:16:42,  1.08s/it]

{'loss': 0.5599, 'learning_rate': 1.4061538461538463e-05, 'epoch': 2.16}


 72%|███████▏  | 87630/121875 [27:34:09<10:19:11,  1.08s/it]

{'loss': 0.6258, 'learning_rate': 1.404923076923077e-05, 'epoch': 2.16}


 72%|███████▏  | 87660/121875 [27:34:41<10:19:54,  1.09s/it]

{'loss': 0.6536, 'learning_rate': 1.4036923076923078e-05, 'epoch': 2.16}


 72%|███████▏  | 87690/121875 [27:35:14<10:15:25,  1.08s/it]

{'loss': 0.6107, 'learning_rate': 1.4024615384615387e-05, 'epoch': 2.16}


 72%|███████▏  | 87720/121875 [27:35:46<10:16:14,  1.08s/it]

{'loss': 0.6337, 'learning_rate': 1.4012307692307694e-05, 'epoch': 2.16}


 72%|███████▏  | 87750/121875 [27:36:19<10:20:41,  1.09s/it]

{'loss': 0.539, 'learning_rate': 1.4000000000000001e-05, 'epoch': 2.16}


 72%|███████▏  | 87780/121875 [27:36:51<10:16:26,  1.08s/it]

{'loss': 0.6324, 'learning_rate': 1.3987692307692307e-05, 'epoch': 2.16}


 72%|███████▏  | 87810/121875 [27:37:24<10:13:09,  1.08s/it]

{'loss': 0.6489, 'learning_rate': 1.3975384615384615e-05, 'epoch': 2.16}


 72%|███████▏  | 87840/121875 [27:37:56<10:13:02,  1.08s/it]

{'loss': 0.6269, 'learning_rate': 1.3963076923076924e-05, 'epoch': 2.16}


 72%|███████▏  | 87870/121875 [27:38:29<10:14:51,  1.08s/it]

{'loss': 0.612, 'learning_rate': 1.3950769230769231e-05, 'epoch': 2.16}


 72%|███████▏  | 87900/121875 [27:39:01<10:13:18,  1.08s/it]

{'loss': 0.6663, 'learning_rate': 1.3938461538461539e-05, 'epoch': 2.16}


 72%|███████▏  | 87930/121875 [27:39:34<10:15:26,  1.09s/it]

{'loss': 0.5868, 'learning_rate': 1.3926153846153848e-05, 'epoch': 2.16}


 72%|███████▏  | 87960/121875 [27:40:06<10:10:27,  1.08s/it]

{'loss': 0.5678, 'learning_rate': 1.3913846153846155e-05, 'epoch': 2.17}


 72%|███████▏  | 87990/121875 [27:40:39<10:14:04,  1.09s/it]

{'loss': 0.5261, 'learning_rate': 1.3901538461538463e-05, 'epoch': 2.17}


 72%|███████▏  | 88020/121875 [27:41:14<10:11:51,  1.08s/it]

{'loss': 0.6044, 'learning_rate': 1.3889230769230772e-05, 'epoch': 2.17}


 72%|███████▏  | 88050/121875 [27:41:46<10:08:14,  1.08s/it]

{'loss': 0.6604, 'learning_rate': 1.3876923076923076e-05, 'epoch': 2.17}


 72%|███████▏  | 88080/121875 [27:42:19<10:06:59,  1.08s/it]

{'loss': 0.6451, 'learning_rate': 1.3864615384615385e-05, 'epoch': 2.17}


 72%|███████▏  | 88110/121875 [27:42:51<10:06:10,  1.08s/it]

{'loss': 0.5989, 'learning_rate': 1.3852307692307692e-05, 'epoch': 2.17}


 72%|███████▏  | 88140/121875 [27:43:23<10:07:09,  1.08s/it]

{'loss': 0.6633, 'learning_rate': 1.384e-05, 'epoch': 2.17}


 72%|███████▏  | 88170/121875 [27:43:56<10:09:15,  1.08s/it]

{'loss': 0.5967, 'learning_rate': 1.3827692307692309e-05, 'epoch': 2.17}


 72%|███████▏  | 88200/121875 [27:44:29<10:05:57,  1.08s/it]

{'loss': 0.5878, 'learning_rate': 1.3815384615384616e-05, 'epoch': 2.17}


 72%|███████▏  | 88230/121875 [27:45:01<10:00:05,  1.07s/it]

{'loss': 0.6272, 'learning_rate': 1.3803076923076924e-05, 'epoch': 2.17}


 72%|███████▏  | 88260/121875 [27:45:33<10:06:54,  1.08s/it]

{'loss': 0.6017, 'learning_rate': 1.3790769230769233e-05, 'epoch': 2.17}


 72%|███████▏  | 88290/121875 [27:46:06<10:07:30,  1.09s/it]

{'loss': 0.5878, 'learning_rate': 1.377846153846154e-05, 'epoch': 2.17}


 72%|███████▏  | 88320/121875 [27:46:38<10:03:45,  1.08s/it]

{'loss': 0.5774, 'learning_rate': 1.3766153846153848e-05, 'epoch': 2.17}


 72%|███████▏  | 88350/121875 [27:47:11<10:04:24,  1.08s/it]

{'loss': 0.5958, 'learning_rate': 1.3753846153846153e-05, 'epoch': 2.17}


 73%|███████▎  | 88380/121875 [27:47:43<10:02:25,  1.08s/it]

{'loss': 0.5206, 'learning_rate': 1.374153846153846e-05, 'epoch': 2.18}


 73%|███████▎  | 88410/121875 [27:48:15<10:01:41,  1.08s/it]

{'loss': 0.6146, 'learning_rate': 1.372923076923077e-05, 'epoch': 2.18}


 73%|███████▎  | 88440/121875 [27:48:48<9:57:45,  1.07s/it] 

{'loss': 0.6204, 'learning_rate': 1.3716923076923077e-05, 'epoch': 2.18}


 73%|███████▎  | 88470/121875 [27:49:20<10:01:17,  1.08s/it]

{'loss': 0.6566, 'learning_rate': 1.3704615384615385e-05, 'epoch': 2.18}


 73%|███████▎  | 88500/121875 [27:49:53<10:00:32,  1.08s/it]

{'loss': 0.6133, 'learning_rate': 1.3692307692307694e-05, 'epoch': 2.18}


 73%|███████▎  | 88530/121875 [27:50:27<10:00:15,  1.08s/it]

{'loss': 0.6379, 'learning_rate': 1.3680000000000001e-05, 'epoch': 2.18}


 73%|███████▎  | 88560/121875 [27:51:00<10:01:26,  1.08s/it]

{'loss': 0.662, 'learning_rate': 1.3667692307692309e-05, 'epoch': 2.18}


 73%|███████▎  | 88590/121875 [27:51:32<9:58:17,  1.08s/it] 

{'loss': 0.6294, 'learning_rate': 1.3655384615384618e-05, 'epoch': 2.18}


 73%|███████▎  | 88620/121875 [27:52:04<9:57:42,  1.08s/it] 

{'loss': 0.5877, 'learning_rate': 1.3643076923076925e-05, 'epoch': 2.18}


 73%|███████▎  | 88650/121875 [27:52:37<10:03:23,  1.09s/it]

{'loss': 0.5593, 'learning_rate': 1.363076923076923e-05, 'epoch': 2.18}


 73%|███████▎  | 88680/121875 [27:53:09<10:00:02,  1.08s/it]

{'loss': 0.6434, 'learning_rate': 1.3618461538461538e-05, 'epoch': 2.18}


 73%|███████▎  | 88710/121875 [27:53:42<9:58:17,  1.08s/it] 

{'loss': 0.6188, 'learning_rate': 1.3606153846153846e-05, 'epoch': 2.18}


 73%|███████▎  | 88740/121875 [27:54:14<9:57:32,  1.08s/it] 

{'loss': 0.6215, 'learning_rate': 1.3593846153846155e-05, 'epoch': 2.18}


 73%|███████▎  | 88770/121875 [27:54:47<9:56:47,  1.08s/it] 

{'loss': 0.6103, 'learning_rate': 1.3581538461538462e-05, 'epoch': 2.19}


 73%|███████▎  | 88800/121875 [27:55:19<9:56:10,  1.08s/it] 

{'loss': 0.6082, 'learning_rate': 1.356923076923077e-05, 'epoch': 2.19}


 73%|███████▎  | 88830/121875 [27:55:52<9:54:48,  1.08s/it] 

{'loss': 0.6642, 'learning_rate': 1.3556923076923079e-05, 'epoch': 2.19}


 73%|███████▎  | 88860/121875 [27:56:24<9:51:19,  1.07s/it] 

{'loss': 0.5885, 'learning_rate': 1.3544615384615386e-05, 'epoch': 2.19}


 73%|███████▎  | 88890/121875 [27:56:57<9:54:08,  1.08s/it]

{'loss': 0.6794, 'learning_rate': 1.3532307692307694e-05, 'epoch': 2.19}


 73%|███████▎  | 88920/121875 [27:57:29<9:55:30,  1.08s/it]

{'loss': 0.6495, 'learning_rate': 1.352e-05, 'epoch': 2.19}


 73%|███████▎  | 88950/121875 [27:58:01<9:49:13,  1.07s/it] 

{'loss': 0.6196, 'learning_rate': 1.3507692307692307e-05, 'epoch': 2.19}


 73%|███████▎  | 88980/121875 [27:58:34<9:53:45,  1.08s/it]

{'loss': 0.6342, 'learning_rate': 1.3495384615384616e-05, 'epoch': 2.19}


 73%|███████▎  | 89010/121875 [27:59:09<10:09:43,  1.11s/it]

{'loss': 0.609, 'learning_rate': 1.3483076923076923e-05, 'epoch': 2.19}


 73%|███████▎  | 89040/121875 [27:59:41<9:57:08,  1.09s/it] 

{'loss': 0.6046, 'learning_rate': 1.347076923076923e-05, 'epoch': 2.19}


 73%|███████▎  | 89070/121875 [28:00:14<9:52:24,  1.08s/it] 

{'loss': 0.6114, 'learning_rate': 1.345846153846154e-05, 'epoch': 2.19}


 73%|███████▎  | 89100/121875 [28:00:46<9:48:23,  1.08s/it]

{'loss': 0.6004, 'learning_rate': 1.3446153846153847e-05, 'epoch': 2.19}


 73%|███████▎  | 89130/121875 [28:01:18<9:50:54,  1.08s/it]

{'loss': 0.583, 'learning_rate': 1.3433846153846155e-05, 'epoch': 2.19}


 73%|███████▎  | 89160/121875 [28:01:51<9:48:56,  1.08s/it]

{'loss': 0.6587, 'learning_rate': 1.3421538461538464e-05, 'epoch': 2.19}


 73%|███████▎  | 89190/121875 [28:02:23<9:50:24,  1.08s/it]

{'loss': 0.5994, 'learning_rate': 1.3409230769230771e-05, 'epoch': 2.2}


 73%|███████▎  | 89220/121875 [28:02:56<9:50:09,  1.08s/it]

{'loss': 0.5824, 'learning_rate': 1.3396923076923077e-05, 'epoch': 2.2}


 73%|███████▎  | 89250/121875 [28:03:28<9:51:04,  1.09s/it]

{'loss': 0.6832, 'learning_rate': 1.3384615384615384e-05, 'epoch': 2.2}


 73%|███████▎  | 89280/121875 [28:04:01<9:49:43,  1.09s/it]

{'loss': 0.5938, 'learning_rate': 1.3372307692307692e-05, 'epoch': 2.2}


 73%|███████▎  | 89310/121875 [28:04:33<9:50:43,  1.09s/it]

{'loss': 0.6529, 'learning_rate': 1.336e-05, 'epoch': 2.2}


 73%|███████▎  | 89340/121875 [28:05:06<9:42:48,  1.07s/it]

{'loss': 0.6314, 'learning_rate': 1.3347692307692308e-05, 'epoch': 2.2}


 73%|███████▎  | 89370/121875 [28:05:38<9:43:30,  1.08s/it]

{'loss': 0.6379, 'learning_rate': 1.3335384615384616e-05, 'epoch': 2.2}


 73%|███████▎  | 89400/121875 [28:06:10<9:41:13,  1.07s/it]

{'loss': 0.546, 'learning_rate': 1.3323076923076925e-05, 'epoch': 2.2}


 73%|███████▎  | 89430/121875 [28:06:43<9:43:24,  1.08s/it]

{'loss': 0.7061, 'learning_rate': 1.3310769230769232e-05, 'epoch': 2.2}


 73%|███████▎  | 89460/121875 [28:07:15<9:44:33,  1.08s/it]

{'loss': 0.6017, 'learning_rate': 1.329846153846154e-05, 'epoch': 2.2}


 73%|███████▎  | 89490/121875 [28:07:47<9:41:59,  1.08s/it]

{'loss': 0.6209, 'learning_rate': 1.3286153846153849e-05, 'epoch': 2.2}


 73%|███████▎  | 89520/121875 [28:08:22<9:47:04,  1.09s/it] 

{'loss': 0.6332, 'learning_rate': 1.3273846153846153e-05, 'epoch': 2.2}


 73%|███████▎  | 89550/121875 [28:08:55<9:42:17,  1.08s/it]

{'loss': 0.6406, 'learning_rate': 1.3261538461538462e-05, 'epoch': 2.2}


 74%|███████▎  | 89580/121875 [28:09:27<9:40:43,  1.08s/it]

{'loss': 0.5512, 'learning_rate': 1.324923076923077e-05, 'epoch': 2.21}


 74%|███████▎  | 89610/121875 [28:10:00<9:41:43,  1.08s/it]

{'loss': 0.6192, 'learning_rate': 1.3236923076923077e-05, 'epoch': 2.21}


 74%|███████▎  | 89640/121875 [28:10:32<9:45:12,  1.09s/it]

{'loss': 0.6208, 'learning_rate': 1.3224615384615386e-05, 'epoch': 2.21}


 74%|███████▎  | 89670/121875 [28:11:05<9:39:12,  1.08s/it]

{'loss': 0.6595, 'learning_rate': 1.3212307692307693e-05, 'epoch': 2.21}


 74%|███████▎  | 89700/121875 [28:11:37<9:40:03,  1.08s/it]

{'loss': 0.6073, 'learning_rate': 1.32e-05, 'epoch': 2.21}


 74%|███████▎  | 89730/121875 [28:12:09<9:36:27,  1.08s/it]

{'loss': 0.5759, 'learning_rate': 1.318769230769231e-05, 'epoch': 2.21}


 74%|███████▎  | 89760/121875 [28:12:42<9:36:54,  1.08s/it]

{'loss': 0.6405, 'learning_rate': 1.3175384615384617e-05, 'epoch': 2.21}


 74%|███████▎  | 89790/121875 [28:13:14<9:42:30,  1.09s/it]

{'loss': 0.6285, 'learning_rate': 1.3163076923076923e-05, 'epoch': 2.21}


 74%|███████▎  | 89820/121875 [28:13:47<9:37:13,  1.08s/it]

{'loss': 0.6457, 'learning_rate': 1.315076923076923e-05, 'epoch': 2.21}


 74%|███████▎  | 89850/121875 [28:14:19<9:38:03,  1.08s/it]

{'loss': 0.6258, 'learning_rate': 1.3138461538461538e-05, 'epoch': 2.21}


 74%|███████▎  | 89880/121875 [28:14:52<9:33:34,  1.08s/it]

{'loss': 0.5681, 'learning_rate': 1.3126153846153847e-05, 'epoch': 2.21}


 74%|███████▍  | 89910/121875 [28:15:24<9:38:27,  1.09s/it]

{'loss': 0.6612, 'learning_rate': 1.3113846153846154e-05, 'epoch': 2.21}


 74%|███████▍  | 89940/121875 [28:15:56<9:41:16,  1.09s/it]

{'loss': 0.6401, 'learning_rate': 1.3101538461538462e-05, 'epoch': 2.21}


 74%|███████▍  | 89970/121875 [28:16:29<9:35:29,  1.08s/it]

{'loss': 0.6405, 'learning_rate': 1.308923076923077e-05, 'epoch': 2.21}


 74%|███████▍  | 90000/121875 [28:17:01<9:36:01,  1.08s/it]

{'loss': 0.6468, 'learning_rate': 1.3076923076923078e-05, 'epoch': 2.22}


 74%|███████▍  | 90030/121875 [28:17:36<9:31:51,  1.08s/it] 

{'loss': 0.6244, 'learning_rate': 1.3064615384615386e-05, 'epoch': 2.22}


 74%|███████▍  | 90060/121875 [28:18:09<9:33:06,  1.08s/it]

{'loss': 0.5868, 'learning_rate': 1.3052307692307695e-05, 'epoch': 2.22}


 74%|███████▍  | 90090/121875 [28:18:41<9:32:52,  1.08s/it]

{'loss': 0.6255, 'learning_rate': 1.3039999999999999e-05, 'epoch': 2.22}


 74%|███████▍  | 90120/121875 [28:19:14<9:35:15,  1.09s/it]

{'loss': 0.7095, 'learning_rate': 1.3027692307692308e-05, 'epoch': 2.22}


 74%|███████▍  | 90150/121875 [28:19:46<9:30:57,  1.08s/it]

{'loss': 0.5946, 'learning_rate': 1.3015384615384615e-05, 'epoch': 2.22}


 74%|███████▍  | 90180/121875 [28:20:18<9:29:17,  1.08s/it]

{'loss': 0.6261, 'learning_rate': 1.3003076923076923e-05, 'epoch': 2.22}


 74%|███████▍  | 90210/121875 [28:20:51<9:27:12,  1.07s/it]

{'loss': 0.6703, 'learning_rate': 1.2990769230769232e-05, 'epoch': 2.22}


 74%|███████▍  | 90240/121875 [28:21:23<9:27:12,  1.08s/it]

{'loss': 0.6403, 'learning_rate': 1.297846153846154e-05, 'epoch': 2.22}


 74%|███████▍  | 90270/121875 [28:21:56<9:28:11,  1.08s/it]

{'loss': 0.5908, 'learning_rate': 1.2966153846153847e-05, 'epoch': 2.22}


 74%|███████▍  | 90300/121875 [28:22:28<9:31:06,  1.09s/it]

{'loss': 0.6164, 'learning_rate': 1.2953846153846156e-05, 'epoch': 2.22}


 74%|███████▍  | 90330/121875 [28:23:00<9:27:33,  1.08s/it]

{'loss': 0.5849, 'learning_rate': 1.2941538461538463e-05, 'epoch': 2.22}


 74%|███████▍  | 90360/121875 [28:23:33<9:25:21,  1.08s/it]

{'loss': 0.5995, 'learning_rate': 1.2929230769230769e-05, 'epoch': 2.22}


 74%|███████▍  | 90390/121875 [28:24:05<9:30:05,  1.09s/it]

{'loss': 0.6102, 'learning_rate': 1.2916923076923076e-05, 'epoch': 2.22}


 74%|███████▍  | 90420/121875 [28:24:38<9:25:08,  1.08s/it]

{'loss': 0.5541, 'learning_rate': 1.2904615384615384e-05, 'epoch': 2.23}


 74%|███████▍  | 90450/121875 [28:25:10<9:27:07,  1.08s/it]

{'loss': 0.5966, 'learning_rate': 1.2892307692307693e-05, 'epoch': 2.23}


 74%|███████▍  | 90480/121875 [28:25:43<9:27:56,  1.09s/it]

{'loss': 0.6426, 'learning_rate': 1.288e-05, 'epoch': 2.23}


 74%|███████▍  | 90510/121875 [28:26:18<9:42:01,  1.11s/it] 

{'loss': 0.5746, 'learning_rate': 1.2867692307692308e-05, 'epoch': 2.23}


 74%|███████▍  | 90540/121875 [28:26:50<9:23:23,  1.08s/it]

{'loss': 0.6248, 'learning_rate': 1.2855384615384617e-05, 'epoch': 2.23}


 74%|███████▍  | 90570/121875 [28:27:22<9:24:46,  1.08s/it]

{'loss': 0.6146, 'learning_rate': 1.2843076923076924e-05, 'epoch': 2.23}


 74%|███████▍  | 90600/121875 [28:27:55<9:21:05,  1.08s/it]

{'loss': 0.6945, 'learning_rate': 1.2830769230769232e-05, 'epoch': 2.23}


 74%|███████▍  | 90630/121875 [28:28:27<9:25:13,  1.09s/it]

{'loss': 0.6624, 'learning_rate': 1.281846153846154e-05, 'epoch': 2.23}


 74%|███████▍  | 90660/121875 [28:29:00<9:23:45,  1.08s/it]

{'loss': 0.6167, 'learning_rate': 1.2806153846153845e-05, 'epoch': 2.23}


 74%|███████▍  | 90690/121875 [28:29:32<9:23:53,  1.08s/it]

{'loss': 0.6224, 'learning_rate': 1.2793846153846154e-05, 'epoch': 2.23}


 74%|███████▍  | 90720/121875 [28:30:05<9:23:23,  1.09s/it]

{'loss': 0.6668, 'learning_rate': 1.2781538461538461e-05, 'epoch': 2.23}


 74%|███████▍  | 90750/121875 [28:30:37<9:19:46,  1.08s/it]

{'loss': 0.5904, 'learning_rate': 1.2769230769230769e-05, 'epoch': 2.23}


 74%|███████▍  | 90780/121875 [28:31:10<9:21:44,  1.08s/it]

{'loss': 0.6257, 'learning_rate': 1.2756923076923078e-05, 'epoch': 2.23}


 75%|███████▍  | 90810/121875 [28:31:42<9:22:45,  1.09s/it]

{'loss': 0.5601, 'learning_rate': 1.2744615384615385e-05, 'epoch': 2.24}


 75%|███████▍  | 90840/121875 [28:32:15<9:19:08,  1.08s/it]

{'loss': 0.6573, 'learning_rate': 1.2732307692307693e-05, 'epoch': 2.24}


 75%|███████▍  | 90870/121875 [28:32:47<9:18:15,  1.08s/it]

{'loss': 0.5755, 'learning_rate': 1.2720000000000002e-05, 'epoch': 2.24}


 75%|███████▍  | 90900/121875 [28:33:19<9:18:07,  1.08s/it]

{'loss': 0.6318, 'learning_rate': 1.270769230769231e-05, 'epoch': 2.24}


 75%|███████▍  | 90930/121875 [28:33:52<9:20:29,  1.09s/it]

{'loss': 0.6197, 'learning_rate': 1.2695384615384617e-05, 'epoch': 2.24}


 75%|███████▍  | 90960/121875 [28:34:24<9:22:48,  1.09s/it]

{'loss': 0.5986, 'learning_rate': 1.2683076923076922e-05, 'epoch': 2.24}


 75%|███████▍  | 90990/121875 [28:34:57<9:16:25,  1.08s/it]

{'loss': 0.6833, 'learning_rate': 1.267076923076923e-05, 'epoch': 2.24}


 75%|███████▍  | 91020/121875 [28:35:32<9:14:42,  1.08s/it] 

{'loss': 0.5893, 'learning_rate': 1.2658461538461539e-05, 'epoch': 2.24}


 75%|███████▍  | 91050/121875 [28:36:04<9:17:59,  1.09s/it]

{'loss': 0.5903, 'learning_rate': 1.2646153846153846e-05, 'epoch': 2.24}


 75%|███████▍  | 91080/121875 [28:36:37<9:21:28,  1.09s/it]

{'loss': 0.5983, 'learning_rate': 1.2633846153846154e-05, 'epoch': 2.24}


 75%|███████▍  | 91110/121875 [28:37:10<9:13:29,  1.08s/it]

{'loss': 0.6389, 'learning_rate': 1.2621538461538463e-05, 'epoch': 2.24}


 75%|███████▍  | 91140/121875 [28:37:42<9:13:52,  1.08s/it]

{'loss': 0.5778, 'learning_rate': 1.260923076923077e-05, 'epoch': 2.24}


 75%|███████▍  | 91170/121875 [28:38:14<9:14:39,  1.08s/it]

{'loss': 0.6016, 'learning_rate': 1.2596923076923078e-05, 'epoch': 2.24}


 75%|███████▍  | 91200/121875 [28:38:47<9:19:26,  1.09s/it]

{'loss': 0.6052, 'learning_rate': 1.2584615384615387e-05, 'epoch': 2.24}


 75%|███████▍  | 91230/121875 [28:39:20<9:14:09,  1.09s/it]

{'loss': 0.6436, 'learning_rate': 1.2572307692307691e-05, 'epoch': 2.25}


 75%|███████▍  | 91260/121875 [28:39:52<9:10:34,  1.08s/it]

{'loss': 0.6274, 'learning_rate': 1.256e-05, 'epoch': 2.25}


 75%|███████▍  | 91290/121875 [28:40:24<9:12:06,  1.08s/it]

{'loss': 0.63, 'learning_rate': 1.2547692307692307e-05, 'epoch': 2.25}


 75%|███████▍  | 91320/121875 [28:40:57<9:14:12,  1.09s/it]

{'loss': 0.627, 'learning_rate': 1.2535384615384615e-05, 'epoch': 2.25}


 75%|███████▍  | 91350/121875 [28:41:30<9:07:51,  1.08s/it]

{'loss': 0.5575, 'learning_rate': 1.2523076923076924e-05, 'epoch': 2.25}


 75%|███████▍  | 91380/121875 [28:42:02<9:09:05,  1.08s/it]

{'loss': 0.6529, 'learning_rate': 1.2510769230769231e-05, 'epoch': 2.25}


 75%|███████▌  | 91410/121875 [28:42:34<9:07:39,  1.08s/it]

{'loss': 0.6264, 'learning_rate': 1.2498461538461539e-05, 'epoch': 2.25}


 75%|███████▌  | 91440/121875 [28:43:07<9:07:36,  1.08s/it]

{'loss': 0.5853, 'learning_rate': 1.2486153846153848e-05, 'epoch': 2.25}


 75%|███████▌  | 91470/121875 [28:43:39<9:06:02,  1.08s/it]

{'loss': 0.5569, 'learning_rate': 1.2473846153846154e-05, 'epoch': 2.25}


 75%|███████▌  | 91500/121875 [28:44:11<9:06:49,  1.08s/it]

{'loss': 0.5499, 'learning_rate': 1.2461538461538463e-05, 'epoch': 2.25}


 75%|███████▌  | 91530/121875 [28:44:46<9:08:05,  1.08s/it] 

{'loss': 0.6316, 'learning_rate': 1.244923076923077e-05, 'epoch': 2.25}


 75%|███████▌  | 91560/121875 [28:45:19<9:03:40,  1.08s/it]

{'loss': 0.6265, 'learning_rate': 1.2436923076923078e-05, 'epoch': 2.25}


 75%|███████▌  | 91590/121875 [28:45:51<9:08:19,  1.09s/it]

{'loss': 0.5797, 'learning_rate': 1.2424615384615385e-05, 'epoch': 2.25}


 75%|███████▌  | 91620/121875 [28:46:24<9:05:55,  1.08s/it]

{'loss': 0.6349, 'learning_rate': 1.2412307692307692e-05, 'epoch': 2.26}


 75%|███████▌  | 91650/121875 [28:46:56<9:03:15,  1.08s/it]

{'loss': 0.6393, 'learning_rate': 1.24e-05, 'epoch': 2.26}


 75%|███████▌  | 91680/121875 [28:47:28<9:03:48,  1.08s/it]

{'loss': 0.5999, 'learning_rate': 1.2387692307692309e-05, 'epoch': 2.26}


 75%|███████▌  | 91710/121875 [28:48:01<9:07:38,  1.09s/it]

{'loss': 0.5982, 'learning_rate': 1.2375384615384616e-05, 'epoch': 2.26}


 75%|███████▌  | 91740/121875 [28:48:33<9:04:44,  1.08s/it]

{'loss': 0.5895, 'learning_rate': 1.2363076923076924e-05, 'epoch': 2.26}


 75%|███████▌  | 91770/121875 [28:49:06<9:04:17,  1.08s/it]

{'loss': 0.625, 'learning_rate': 1.2350769230769231e-05, 'epoch': 2.26}


 75%|███████▌  | 91800/121875 [28:49:38<9:07:08,  1.09s/it]

{'loss': 0.6485, 'learning_rate': 1.2338461538461539e-05, 'epoch': 2.26}


 75%|███████▌  | 91830/121875 [28:50:11<8:59:27,  1.08s/it]

{'loss': 0.5789, 'learning_rate': 1.2326153846153848e-05, 'epoch': 2.26}


 75%|███████▌  | 91860/121875 [28:50:43<9:01:16,  1.08s/it]

{'loss': 0.6434, 'learning_rate': 1.2313846153846155e-05, 'epoch': 2.26}


 75%|███████▌  | 91890/121875 [28:51:16<8:57:54,  1.08s/it]

{'loss': 0.6125, 'learning_rate': 1.2301538461538461e-05, 'epoch': 2.26}


 75%|███████▌  | 91920/121875 [28:51:48<9:02:21,  1.09s/it]

{'loss': 0.5699, 'learning_rate': 1.228923076923077e-05, 'epoch': 2.26}


 75%|███████▌  | 91950/121875 [28:52:21<8:58:03,  1.08s/it]

{'loss': 0.6551, 'learning_rate': 1.2276923076923077e-05, 'epoch': 2.26}


 75%|███████▌  | 91980/121875 [28:52:53<8:56:58,  1.08s/it]

{'loss': 0.6246, 'learning_rate': 1.2264615384615385e-05, 'epoch': 2.26}


 75%|███████▌  | 92010/121875 [28:53:28<9:12:34,  1.11s/it] 

{'loss': 0.6329, 'learning_rate': 1.2252307692307694e-05, 'epoch': 2.26}


 76%|███████▌  | 92040/121875 [28:54:01<8:56:26,  1.08s/it]

{'loss': 0.6141, 'learning_rate': 1.224e-05, 'epoch': 2.27}


 76%|███████▌  | 92070/121875 [28:54:33<8:55:37,  1.08s/it]

{'loss': 0.5912, 'learning_rate': 1.2227692307692309e-05, 'epoch': 2.27}


 76%|███████▌  | 92100/121875 [28:55:06<8:57:20,  1.08s/it]

{'loss': 0.6299, 'learning_rate': 1.2215384615384616e-05, 'epoch': 2.27}


 76%|███████▌  | 92130/121875 [28:55:38<8:56:41,  1.08s/it]

{'loss': 0.604, 'learning_rate': 1.2203076923076924e-05, 'epoch': 2.27}


 76%|███████▌  | 92160/121875 [28:56:11<8:54:54,  1.08s/it]

{'loss': 0.6339, 'learning_rate': 1.2190769230769233e-05, 'epoch': 2.27}


 76%|███████▌  | 92190/121875 [28:56:43<8:52:29,  1.08s/it]

{'loss': 0.5484, 'learning_rate': 1.2178461538461538e-05, 'epoch': 2.27}


 76%|███████▌  | 92220/121875 [28:57:15<8:55:06,  1.08s/it]

{'loss': 0.6178, 'learning_rate': 1.2166153846153846e-05, 'epoch': 2.27}


 76%|███████▌  | 92250/121875 [28:57:48<8:51:50,  1.08s/it]

{'loss': 0.6085, 'learning_rate': 1.2153846153846155e-05, 'epoch': 2.27}


 76%|███████▌  | 92280/121875 [28:58:20<8:54:32,  1.08s/it]

{'loss': 0.6131, 'learning_rate': 1.2141538461538462e-05, 'epoch': 2.27}


 76%|███████▌  | 92310/121875 [28:58:53<8:50:45,  1.08s/it]

{'loss': 0.6136, 'learning_rate': 1.212923076923077e-05, 'epoch': 2.27}


 76%|███████▌  | 92340/121875 [28:59:25<8:49:21,  1.08s/it]

{'loss': 0.6367, 'learning_rate': 1.2116923076923077e-05, 'epoch': 2.27}


 76%|███████▌  | 92370/121875 [28:59:58<8:50:42,  1.08s/it]

{'loss': 0.6386, 'learning_rate': 1.2104615384615385e-05, 'epoch': 2.27}


 76%|███████▌  | 92400/121875 [29:00:30<8:50:46,  1.08s/it]

{'loss': 0.5712, 'learning_rate': 1.2092307692307694e-05, 'epoch': 2.27}


 76%|███████▌  | 92430/121875 [29:01:02<8:48:28,  1.08s/it]

{'loss': 0.5949, 'learning_rate': 1.2080000000000001e-05, 'epoch': 2.28}


 76%|███████▌  | 92460/121875 [29:01:35<8:49:47,  1.08s/it]

{'loss': 0.5646, 'learning_rate': 1.2067692307692307e-05, 'epoch': 2.28}


 76%|███████▌  | 92490/121875 [29:02:07<8:47:21,  1.08s/it]

{'loss': 0.6578, 'learning_rate': 1.2055384615384616e-05, 'epoch': 2.28}


 76%|███████▌  | 92520/121875 [29:02:42<8:52:10,  1.09s/it] 

{'loss': 0.5733, 'learning_rate': 1.2043076923076923e-05, 'epoch': 2.28}


 76%|███████▌  | 92550/121875 [29:03:15<8:51:14,  1.09s/it]

{'loss': 0.5834, 'learning_rate': 1.2030769230769231e-05, 'epoch': 2.28}


 76%|███████▌  | 92580/121875 [29:03:47<8:47:04,  1.08s/it]

{'loss': 0.5743, 'learning_rate': 1.201846153846154e-05, 'epoch': 2.28}


 76%|███████▌  | 92610/121875 [29:04:19<8:50:17,  1.09s/it]

{'loss': 0.6225, 'learning_rate': 1.2006153846153846e-05, 'epoch': 2.28}


 76%|███████▌  | 92640/121875 [29:04:52<8:47:11,  1.08s/it]

{'loss': 0.5668, 'learning_rate': 1.1993846153846155e-05, 'epoch': 2.28}


 76%|███████▌  | 92670/121875 [29:05:24<8:46:58,  1.08s/it]

{'loss': 0.6807, 'learning_rate': 1.1981538461538462e-05, 'epoch': 2.28}


 76%|███████▌  | 92700/121875 [29:05:56<8:43:00,  1.08s/it]

{'loss': 0.6511, 'learning_rate': 1.196923076923077e-05, 'epoch': 2.28}


 76%|███████▌  | 92730/121875 [29:06:29<8:43:10,  1.08s/it]

{'loss': 0.5979, 'learning_rate': 1.1956923076923079e-05, 'epoch': 2.28}


 76%|███████▌  | 92760/121875 [29:07:01<8:49:08,  1.09s/it]

{'loss': 0.6411, 'learning_rate': 1.1944615384615385e-05, 'epoch': 2.28}


 76%|███████▌  | 92790/121875 [29:07:34<8:43:41,  1.08s/it]

{'loss': 0.6525, 'learning_rate': 1.1932307692307692e-05, 'epoch': 2.28}


 76%|███████▌  | 92820/121875 [29:08:06<8:43:37,  1.08s/it]

{'loss': 0.6309, 'learning_rate': 1.1920000000000001e-05, 'epoch': 2.28}


 76%|███████▌  | 92850/121875 [29:08:38<8:43:43,  1.08s/it]

{'loss': 0.6432, 'learning_rate': 1.1907692307692308e-05, 'epoch': 2.29}


 76%|███████▌  | 92880/121875 [29:09:11<8:40:58,  1.08s/it]

{'loss': 0.6085, 'learning_rate': 1.1895384615384616e-05, 'epoch': 2.29}


 76%|███████▌  | 92910/121875 [29:09:43<8:40:09,  1.08s/it]

{'loss': 0.6518, 'learning_rate': 1.1883076923076923e-05, 'epoch': 2.29}


 76%|███████▋  | 92940/121875 [29:10:16<8:42:06,  1.08s/it]

{'loss': 0.6054, 'learning_rate': 1.187076923076923e-05, 'epoch': 2.29}


 76%|███████▋  | 92970/121875 [29:10:48<8:43:04,  1.09s/it]

{'loss': 0.6046, 'learning_rate': 1.185846153846154e-05, 'epoch': 2.29}


 76%|███████▋  | 93000/121875 [29:11:21<8:38:55,  1.08s/it]

{'loss': 0.649, 'learning_rate': 1.1846153846153847e-05, 'epoch': 2.29}


 76%|███████▋  | 93030/121875 [29:11:55<8:39:03,  1.08s/it] 

{'loss': 0.6425, 'learning_rate': 1.1833846153846155e-05, 'epoch': 2.29}


 76%|███████▋  | 93060/121875 [29:12:28<8:38:28,  1.08s/it]

{'loss': 0.623, 'learning_rate': 1.1821538461538462e-05, 'epoch': 2.29}


 76%|███████▋  | 93090/121875 [29:13:00<8:37:07,  1.08s/it]

{'loss': 0.6682, 'learning_rate': 1.180923076923077e-05, 'epoch': 2.29}


 76%|███████▋  | 93120/121875 [29:13:33<8:46:11,  1.10s/it]

{'loss': 0.61, 'learning_rate': 1.1796923076923077e-05, 'epoch': 2.29}


 76%|███████▋  | 93150/121875 [29:14:05<8:38:31,  1.08s/it]

{'loss': 0.6579, 'learning_rate': 1.1784615384615386e-05, 'epoch': 2.29}


 76%|███████▋  | 93180/121875 [29:14:38<8:38:30,  1.08s/it]

{'loss': 0.5948, 'learning_rate': 1.1772307692307692e-05, 'epoch': 2.29}


 76%|███████▋  | 93210/121875 [29:15:10<8:38:34,  1.09s/it]

{'loss': 0.616, 'learning_rate': 1.1760000000000001e-05, 'epoch': 2.29}


 77%|███████▋  | 93240/121875 [29:15:42<8:35:47,  1.08s/it]

{'loss': 0.6988, 'learning_rate': 1.1747692307692308e-05, 'epoch': 2.3}


 77%|███████▋  | 93270/121875 [29:16:15<8:35:12,  1.08s/it]

{'loss': 0.6609, 'learning_rate': 1.1735384615384616e-05, 'epoch': 2.3}


 77%|███████▋  | 93300/121875 [29:16:47<8:37:59,  1.09s/it]

{'loss': 0.6166, 'learning_rate': 1.1723076923076925e-05, 'epoch': 2.3}


 77%|███████▋  | 93330/121875 [29:17:20<8:33:53,  1.08s/it]

{'loss': 0.599, 'learning_rate': 1.171076923076923e-05, 'epoch': 2.3}


 77%|███████▋  | 93360/121875 [29:17:52<8:37:12,  1.09s/it]

{'loss': 0.5828, 'learning_rate': 1.1698461538461538e-05, 'epoch': 2.3}


 77%|███████▋  | 93390/121875 [29:18:24<8:33:05,  1.08s/it]

{'loss': 0.5901, 'learning_rate': 1.1686153846153847e-05, 'epoch': 2.3}


 77%|███████▋  | 93420/121875 [29:18:57<8:32:09,  1.08s/it]

{'loss': 0.5977, 'learning_rate': 1.1673846153846155e-05, 'epoch': 2.3}


 77%|███████▋  | 93450/121875 [29:19:29<8:30:04,  1.08s/it]

{'loss': 0.6047, 'learning_rate': 1.1661538461538462e-05, 'epoch': 2.3}


 77%|███████▋  | 93480/121875 [29:20:02<8:30:13,  1.08s/it]

{'loss': 0.5996, 'learning_rate': 1.164923076923077e-05, 'epoch': 2.3}


 77%|███████▋  | 93510/121875 [29:20:37<8:46:53,  1.11s/it] 

{'loss': 0.6345, 'learning_rate': 1.1636923076923077e-05, 'epoch': 2.3}


 77%|███████▋  | 93540/121875 [29:21:09<8:29:36,  1.08s/it]

{'loss': 0.6326, 'learning_rate': 1.1624615384615386e-05, 'epoch': 2.3}


 77%|███████▋  | 93570/121875 [29:21:42<8:27:59,  1.08s/it]

{'loss': 0.5926, 'learning_rate': 1.1612307692307693e-05, 'epoch': 2.3}


 77%|███████▋  | 93600/121875 [29:22:14<8:28:49,  1.08s/it]

{'loss': 0.668, 'learning_rate': 1.16e-05, 'epoch': 2.3}


 77%|███████▋  | 93630/121875 [29:22:47<8:26:43,  1.08s/it]

{'loss': 0.5608, 'learning_rate': 1.1587692307692308e-05, 'epoch': 2.3}


 77%|███████▋  | 93660/121875 [29:23:19<8:30:07,  1.08s/it]

{'loss': 0.6187, 'learning_rate': 1.1575384615384616e-05, 'epoch': 2.31}


 77%|███████▋  | 93690/121875 [29:23:52<8:33:40,  1.09s/it]

{'loss': 0.6955, 'learning_rate': 1.1563076923076923e-05, 'epoch': 2.31}


 77%|███████▋  | 93720/121875 [29:24:24<8:27:22,  1.08s/it]

{'loss': 0.5684, 'learning_rate': 1.1550769230769232e-05, 'epoch': 2.31}


 77%|███████▋  | 93750/121875 [29:24:56<8:25:53,  1.08s/it]

{'loss': 0.632, 'learning_rate': 1.153846153846154e-05, 'epoch': 2.31}


 77%|███████▋  | 93780/121875 [29:25:29<8:25:14,  1.08s/it]

{'loss': 0.6217, 'learning_rate': 1.1526153846153847e-05, 'epoch': 2.31}


 77%|███████▋  | 93810/121875 [29:26:01<8:25:45,  1.08s/it]

{'loss': 0.5854, 'learning_rate': 1.1513846153846154e-05, 'epoch': 2.31}


 77%|███████▋  | 93840/121875 [29:26:34<8:32:41,  1.10s/it]

{'loss': 0.6303, 'learning_rate': 1.1501538461538462e-05, 'epoch': 2.31}


 77%|███████▋  | 93870/121875 [29:27:06<8:23:56,  1.08s/it]

{'loss': 0.6841, 'learning_rate': 1.1489230769230771e-05, 'epoch': 2.31}


 77%|███████▋  | 93900/121875 [29:27:38<8:23:57,  1.08s/it]

{'loss': 0.6114, 'learning_rate': 1.1476923076923078e-05, 'epoch': 2.31}


 77%|███████▋  | 93930/121875 [29:28:11<8:22:51,  1.08s/it]

{'loss': 0.6288, 'learning_rate': 1.1464615384615384e-05, 'epoch': 2.31}


 77%|███████▋  | 93960/121875 [29:28:43<8:23:21,  1.08s/it]

{'loss': 0.6321, 'learning_rate': 1.1452307692307693e-05, 'epoch': 2.31}


 77%|███████▋  | 93990/121875 [29:29:16<8:20:59,  1.08s/it]

{'loss': 0.5785, 'learning_rate': 1.144e-05, 'epoch': 2.31}


 77%|███████▋  | 94020/121875 [29:29:50<8:21:04,  1.08s/it] 

{'loss': 0.6405, 'learning_rate': 1.1427692307692308e-05, 'epoch': 2.31}


 77%|███████▋  | 94050/121875 [29:30:23<8:18:42,  1.08s/it]

{'loss': 0.6164, 'learning_rate': 1.1415384615384615e-05, 'epoch': 2.32}


 77%|███████▋  | 94080/121875 [29:30:55<8:19:51,  1.08s/it]

{'loss': 0.5568, 'learning_rate': 1.1403076923076923e-05, 'epoch': 2.32}


 77%|███████▋  | 94110/121875 [29:31:28<8:19:12,  1.08s/it]

{'loss': 0.5999, 'learning_rate': 1.1390769230769232e-05, 'epoch': 2.32}


 77%|███████▋  | 94140/121875 [29:32:00<8:21:35,  1.09s/it]

{'loss': 0.5944, 'learning_rate': 1.137846153846154e-05, 'epoch': 2.32}


 77%|███████▋  | 94170/121875 [29:32:32<8:16:50,  1.08s/it]

{'loss': 0.6009, 'learning_rate': 1.1366153846153847e-05, 'epoch': 2.32}


 77%|███████▋  | 94200/121875 [29:33:05<8:15:15,  1.07s/it]

{'loss': 0.6115, 'learning_rate': 1.1353846153846154e-05, 'epoch': 2.32}


 77%|███████▋  | 94230/121875 [29:33:37<8:20:11,  1.09s/it]

{'loss': 0.6831, 'learning_rate': 1.1341538461538462e-05, 'epoch': 2.32}


 77%|███████▋  | 94260/121875 [29:34:10<8:16:07,  1.08s/it]

{'loss': 0.6413, 'learning_rate': 1.1329230769230769e-05, 'epoch': 2.32}


 77%|███████▋  | 94290/121875 [29:34:42<8:14:57,  1.08s/it]

{'loss': 0.7059, 'learning_rate': 1.1316923076923078e-05, 'epoch': 2.32}


 77%|███████▋  | 94320/121875 [29:35:14<8:15:00,  1.08s/it]

{'loss': 0.5718, 'learning_rate': 1.1304615384615386e-05, 'epoch': 2.32}


 77%|███████▋  | 94350/121875 [29:35:47<8:14:20,  1.08s/it]

{'loss': 0.6334, 'learning_rate': 1.1292307692307693e-05, 'epoch': 2.32}


 77%|███████▋  | 94380/121875 [29:36:19<8:15:08,  1.08s/it]

{'loss': 0.6087, 'learning_rate': 1.128e-05, 'epoch': 2.32}


 77%|███████▋  | 94410/121875 [29:36:52<8:14:24,  1.08s/it]

{'loss': 0.5964, 'learning_rate': 1.1267692307692308e-05, 'epoch': 2.32}


 77%|███████▋  | 94440/121875 [29:37:24<8:15:31,  1.08s/it]

{'loss': 0.6605, 'learning_rate': 1.1255384615384617e-05, 'epoch': 2.32}


 78%|███████▊  | 94470/121875 [29:37:57<8:11:30,  1.08s/it]

{'loss': 0.6504, 'learning_rate': 1.1243076923076924e-05, 'epoch': 2.33}


 78%|███████▊  | 94500/121875 [29:38:29<8:13:25,  1.08s/it]

{'loss': 0.5948, 'learning_rate': 1.123076923076923e-05, 'epoch': 2.33}


 78%|███████▊  | 94530/121875 [29:39:07<8:12:40,  1.08s/it] 

{'loss': 0.6206, 'learning_rate': 1.121846153846154e-05, 'epoch': 2.33}


 78%|███████▊  | 94560/121875 [29:39:39<8:11:59,  1.08s/it]

{'loss': 0.6204, 'learning_rate': 1.1206153846153847e-05, 'epoch': 2.33}


 78%|███████▊  | 94590/121875 [29:40:11<8:12:04,  1.08s/it]

{'loss': 0.6306, 'learning_rate': 1.1193846153846154e-05, 'epoch': 2.33}


 78%|███████▊  | 94620/121875 [29:40:44<8:11:05,  1.08s/it]

{'loss': 0.5556, 'learning_rate': 1.1181538461538463e-05, 'epoch': 2.33}


 78%|███████▊  | 94650/121875 [29:41:16<8:09:50,  1.08s/it]

{'loss': 0.5328, 'learning_rate': 1.1169230769230769e-05, 'epoch': 2.33}


 78%|███████▊  | 94680/121875 [29:41:49<8:09:56,  1.08s/it]

{'loss': 0.6436, 'learning_rate': 1.1156923076923078e-05, 'epoch': 2.33}


 78%|███████▊  | 94710/121875 [29:42:22<8:35:15,  1.14s/it]

{'loss': 0.6371, 'learning_rate': 1.1144615384615385e-05, 'epoch': 2.33}


 78%|███████▊  | 94740/121875 [29:42:55<8:34:46,  1.14s/it]

{'loss': 0.5757, 'learning_rate': 1.1132307692307693e-05, 'epoch': 2.33}


 78%|███████▊  | 94770/121875 [29:43:29<8:26:25,  1.12s/it]

{'loss': 0.6375, 'learning_rate': 1.112e-05, 'epoch': 2.33}


 78%|███████▊  | 94800/121875 [29:44:02<8:16:11,  1.10s/it]

{'loss': 0.5953, 'learning_rate': 1.1107692307692308e-05, 'epoch': 2.33}


 78%|███████▊  | 94830/121875 [29:44:35<8:14:39,  1.10s/it]

{'loss': 0.6325, 'learning_rate': 1.1095384615384615e-05, 'epoch': 2.33}


 78%|███████▊  | 94860/121875 [29:45:09<8:29:00,  1.13s/it]

{'loss': 0.5987, 'learning_rate': 1.1083076923076924e-05, 'epoch': 2.34}


 78%|███████▊  | 94890/121875 [29:45:43<8:30:13,  1.13s/it]

{'loss': 0.6074, 'learning_rate': 1.1070769230769232e-05, 'epoch': 2.34}


 78%|███████▊  | 94920/121875 [29:46:17<8:26:44,  1.13s/it]

{'loss': 0.6294, 'learning_rate': 1.1058461538461539e-05, 'epoch': 2.34}


 78%|███████▊  | 94950/121875 [29:46:51<8:27:35,  1.13s/it]

{'loss': 0.6288, 'learning_rate': 1.1046153846153846e-05, 'epoch': 2.34}


 78%|███████▊  | 94980/121875 [29:47:25<8:27:41,  1.13s/it]

{'loss': 0.572, 'learning_rate': 1.1033846153846154e-05, 'epoch': 2.34}


 78%|███████▊  | 95010/121875 [29:48:04<8:53:41,  1.19s/it] 

{'loss': 0.5955, 'learning_rate': 1.1021538461538463e-05, 'epoch': 2.34}


 78%|███████▊  | 95040/121875 [29:48:38<8:25:07,  1.13s/it]

{'loss': 0.5909, 'learning_rate': 1.100923076923077e-05, 'epoch': 2.34}


 78%|███████▊  | 95070/121875 [29:49:12<8:25:29,  1.13s/it]

{'loss': 0.5575, 'learning_rate': 1.0996923076923076e-05, 'epoch': 2.34}


 78%|███████▊  | 95100/121875 [29:49:46<8:22:14,  1.13s/it]

{'loss': 0.6376, 'learning_rate': 1.0984615384615385e-05, 'epoch': 2.34}


 78%|███████▊  | 95130/121875 [29:50:20<8:24:09,  1.13s/it]

{'loss': 0.5888, 'learning_rate': 1.0972307692307693e-05, 'epoch': 2.34}


 78%|███████▊  | 95160/121875 [29:50:53<8:23:59,  1.13s/it]

{'loss': 0.5651, 'learning_rate': 1.096e-05, 'epoch': 2.34}


 78%|███████▊  | 95190/121875 [29:51:27<8:20:05,  1.12s/it]

{'loss': 0.6705, 'learning_rate': 1.094769230769231e-05, 'epoch': 2.34}


 78%|███████▊  | 95220/121875 [29:52:01<8:23:10,  1.13s/it]

{'loss': 0.6033, 'learning_rate': 1.0935384615384615e-05, 'epoch': 2.34}


 78%|███████▊  | 95250/121875 [29:52:35<8:19:07,  1.12s/it]

{'loss': 0.7058, 'learning_rate': 1.0923076923076924e-05, 'epoch': 2.34}


 78%|███████▊  | 95280/121875 [29:53:09<8:20:45,  1.13s/it]

{'loss': 0.5795, 'learning_rate': 1.0910769230769231e-05, 'epoch': 2.35}


 78%|███████▊  | 95310/121875 [29:53:43<8:05:41,  1.10s/it]

{'loss': 0.6523, 'learning_rate': 1.0898461538461539e-05, 'epoch': 2.35}


 78%|███████▊  | 95340/121875 [29:54:15<7:57:26,  1.08s/it]

{'loss': 0.6803, 'learning_rate': 1.0886153846153848e-05, 'epoch': 2.35}


 78%|███████▊  | 95370/121875 [29:54:48<7:57:56,  1.08s/it]

{'loss': 0.6631, 'learning_rate': 1.0873846153846154e-05, 'epoch': 2.35}


 78%|███████▊  | 95400/121875 [29:55:20<7:55:23,  1.08s/it]

{'loss': 0.5623, 'learning_rate': 1.0861538461538461e-05, 'epoch': 2.35}


 78%|███████▊  | 95430/121875 [29:55:52<8:00:04,  1.09s/it]

{'loss': 0.6657, 'learning_rate': 1.084923076923077e-05, 'epoch': 2.35}


 78%|███████▊  | 95460/121875 [29:56:25<8:00:23,  1.09s/it]

{'loss': 0.647, 'learning_rate': 1.0836923076923078e-05, 'epoch': 2.35}


 78%|███████▊  | 95490/121875 [29:56:57<7:55:13,  1.08s/it]

{'loss': 0.5841, 'learning_rate': 1.0824615384615385e-05, 'epoch': 2.35}


 78%|███████▊  | 95520/121875 [29:57:36<7:56:50,  1.09s/it] 

{'loss': 0.6139, 'learning_rate': 1.0812307692307693e-05, 'epoch': 2.35}


 78%|███████▊  | 95550/121875 [29:58:08<7:52:17,  1.08s/it]

{'loss': 0.592, 'learning_rate': 1.08e-05, 'epoch': 2.35}


 78%|███████▊  | 95580/121875 [29:58:40<7:53:54,  1.08s/it]

{'loss': 0.6061, 'learning_rate': 1.0787692307692309e-05, 'epoch': 2.35}


 78%|███████▊  | 95610/121875 [29:59:13<7:53:46,  1.08s/it]

{'loss': 0.6184, 'learning_rate': 1.0775384615384616e-05, 'epoch': 2.35}


 78%|███████▊  | 95640/121875 [29:59:45<7:56:29,  1.09s/it]

{'loss': 0.5834, 'learning_rate': 1.0763076923076922e-05, 'epoch': 2.35}


 78%|███████▊  | 95670/121875 [30:00:18<7:50:51,  1.08s/it]

{'loss': 0.5611, 'learning_rate': 1.0750769230769231e-05, 'epoch': 2.35}


 79%|███████▊  | 95700/121875 [30:00:50<7:50:54,  1.08s/it]

{'loss': 0.6201, 'learning_rate': 1.0738461538461539e-05, 'epoch': 2.36}


 79%|███████▊  | 95730/121875 [30:01:23<7:49:01,  1.08s/it]

{'loss': 0.6385, 'learning_rate': 1.0726153846153846e-05, 'epoch': 2.36}


 79%|███████▊  | 95760/121875 [30:01:55<7:49:03,  1.08s/it]

{'loss': 0.5913, 'learning_rate': 1.0713846153846155e-05, 'epoch': 2.36}


 79%|███████▊  | 95790/121875 [30:02:28<7:50:11,  1.08s/it]

{'loss': 0.5893, 'learning_rate': 1.0701538461538461e-05, 'epoch': 2.36}


 79%|███████▊  | 95820/121875 [30:03:00<7:47:55,  1.08s/it]

{'loss': 0.6165, 'learning_rate': 1.068923076923077e-05, 'epoch': 2.36}


 79%|███████▊  | 95850/121875 [30:03:33<7:48:40,  1.08s/it]

{'loss': 0.5999, 'learning_rate': 1.0676923076923078e-05, 'epoch': 2.36}


 79%|███████▊  | 95880/121875 [30:04:05<7:50:14,  1.09s/it]

{'loss': 0.6399, 'learning_rate': 1.0664615384615385e-05, 'epoch': 2.36}


 79%|███████▊  | 95910/121875 [30:04:38<7:46:52,  1.08s/it]

{'loss': 0.6301, 'learning_rate': 1.0652307692307694e-05, 'epoch': 2.36}


 79%|███████▊  | 95940/121875 [30:05:10<7:49:26,  1.09s/it]

{'loss': 0.6202, 'learning_rate': 1.064e-05, 'epoch': 2.36}


 79%|███████▊  | 95970/121875 [30:05:43<7:47:27,  1.08s/it]

{'loss': 0.661, 'learning_rate': 1.0627692307692307e-05, 'epoch': 2.36}


 79%|███████▉  | 96000/121875 [30:06:15<7:47:14,  1.08s/it]

{'loss': 0.5911, 'learning_rate': 1.0615384615384616e-05, 'epoch': 2.36}


 79%|███████▉  | 96030/121875 [30:06:53<7:44:44,  1.08s/it] 

{'loss': 0.6384, 'learning_rate': 1.0603076923076924e-05, 'epoch': 2.36}


 79%|███████▉  | 96060/121875 [30:07:25<7:45:07,  1.08s/it]

{'loss': 0.5356, 'learning_rate': 1.0590769230769231e-05, 'epoch': 2.36}


 79%|███████▉  | 96090/121875 [30:07:58<7:47:06,  1.09s/it]

{'loss': 0.5588, 'learning_rate': 1.0578461538461539e-05, 'epoch': 2.37}


 79%|███████▉  | 96120/121875 [30:08:30<7:43:56,  1.08s/it]

{'loss': 0.574, 'learning_rate': 1.0566153846153846e-05, 'epoch': 2.37}


 79%|███████▉  | 96150/121875 [30:09:03<7:42:59,  1.08s/it]

{'loss': 0.6194, 'learning_rate': 1.0553846153846155e-05, 'epoch': 2.37}


 79%|███████▉  | 96180/121875 [30:09:35<7:41:39,  1.08s/it]

{'loss': 0.5959, 'learning_rate': 1.0541538461538463e-05, 'epoch': 2.37}


 79%|███████▉  | 96210/121875 [30:10:08<7:47:02,  1.09s/it]

{'loss': 0.6304, 'learning_rate': 1.052923076923077e-05, 'epoch': 2.37}


 79%|███████▉  | 96240/121875 [30:10:40<7:43:29,  1.08s/it]

{'loss': 0.6168, 'learning_rate': 1.0516923076923077e-05, 'epoch': 2.37}


 79%|███████▉  | 96270/121875 [30:11:13<7:40:02,  1.08s/it]

{'loss': 0.5674, 'learning_rate': 1.0504615384615385e-05, 'epoch': 2.37}


 79%|███████▉  | 96300/121875 [30:11:45<7:40:10,  1.08s/it]

{'loss': 0.6083, 'learning_rate': 1.0492307692307692e-05, 'epoch': 2.37}


 79%|███████▉  | 96330/121875 [30:12:17<7:37:26,  1.07s/it]

{'loss': 0.6263, 'learning_rate': 1.0480000000000001e-05, 'epoch': 2.37}


 79%|███████▉  | 96360/121875 [30:12:50<7:39:05,  1.08s/it]

{'loss': 0.6007, 'learning_rate': 1.0467692307692307e-05, 'epoch': 2.37}


 79%|███████▉  | 96390/121875 [30:13:22<7:37:58,  1.08s/it]

{'loss': 0.6252, 'learning_rate': 1.0455384615384616e-05, 'epoch': 2.37}


 79%|███████▉  | 96420/121875 [30:13:55<7:40:15,  1.08s/it]

{'loss': 0.6024, 'learning_rate': 1.0443076923076924e-05, 'epoch': 2.37}


 79%|███████▉  | 96450/121875 [30:14:27<7:36:47,  1.08s/it]

{'loss': 0.6086, 'learning_rate': 1.0430769230769231e-05, 'epoch': 2.37}


 79%|███████▉  | 96480/121875 [30:15:00<7:37:11,  1.08s/it]

{'loss': 0.638, 'learning_rate': 1.041846153846154e-05, 'epoch': 2.37}


 79%|███████▉  | 96510/121875 [30:15:38<8:01:34,  1.14s/it] 

{'loss': 0.6697, 'learning_rate': 1.0406153846153846e-05, 'epoch': 2.38}


 79%|███████▉  | 96540/121875 [30:16:10<7:37:01,  1.08s/it]

{'loss': 0.6078, 'learning_rate': 1.0393846153846153e-05, 'epoch': 2.38}


 79%|███████▉  | 96570/121875 [30:16:42<7:35:19,  1.08s/it]

{'loss': 0.5754, 'learning_rate': 1.0381538461538462e-05, 'epoch': 2.38}


 79%|███████▉  | 96600/121875 [30:17:15<7:34:22,  1.08s/it]

{'loss': 0.635, 'learning_rate': 1.036923076923077e-05, 'epoch': 2.38}


 79%|███████▉  | 96630/121875 [30:17:47<7:35:13,  1.08s/it]

{'loss': 0.603, 'learning_rate': 1.0356923076923077e-05, 'epoch': 2.38}


 79%|███████▉  | 96660/121875 [30:18:20<7:33:18,  1.08s/it]

{'loss': 0.6399, 'learning_rate': 1.0344615384615385e-05, 'epoch': 2.38}


 79%|███████▉  | 96690/121875 [30:18:52<7:35:27,  1.09s/it]

{'loss': 0.5832, 'learning_rate': 1.0332307692307692e-05, 'epoch': 2.38}


 79%|███████▉  | 96720/121875 [30:19:25<7:37:59,  1.09s/it]

{'loss': 0.6412, 'learning_rate': 1.0320000000000001e-05, 'epoch': 2.38}


 79%|███████▉  | 96750/121875 [30:19:57<7:33:55,  1.08s/it]

{'loss': 0.622, 'learning_rate': 1.0307692307692309e-05, 'epoch': 2.38}


 79%|███████▉  | 96780/121875 [30:20:29<7:31:52,  1.08s/it]

{'loss': 0.6187, 'learning_rate': 1.0295384615384616e-05, 'epoch': 2.38}


 79%|███████▉  | 96810/121875 [30:21:02<7:32:29,  1.08s/it]

{'loss': 0.5351, 'learning_rate': 1.0283076923076923e-05, 'epoch': 2.38}


 79%|███████▉  | 96840/121875 [30:21:34<7:31:23,  1.08s/it]

{'loss': 0.6168, 'learning_rate': 1.027076923076923e-05, 'epoch': 2.38}


 79%|███████▉  | 96870/121875 [30:22:07<7:29:32,  1.08s/it]

{'loss': 0.5781, 'learning_rate': 1.0258461538461538e-05, 'epoch': 2.38}


 80%|███████▉  | 96900/121875 [30:22:39<7:28:51,  1.08s/it]

{'loss': 0.6527, 'learning_rate': 1.0246153846153847e-05, 'epoch': 2.39}


 80%|███████▉  | 96930/121875 [30:23:12<7:34:05,  1.09s/it]

{'loss': 0.6503, 'learning_rate': 1.0233846153846155e-05, 'epoch': 2.39}


 80%|███████▉  | 96960/121875 [30:23:44<7:24:47,  1.07s/it]

{'loss': 0.5942, 'learning_rate': 1.0221538461538462e-05, 'epoch': 2.39}


 80%|███████▉  | 96990/121875 [30:24:17<7:29:25,  1.08s/it]

{'loss': 0.6425, 'learning_rate': 1.020923076923077e-05, 'epoch': 2.39}


 80%|███████▉  | 97020/121875 [30:24:54<7:26:14,  1.08s/it] 

{'loss': 0.608, 'learning_rate': 1.0196923076923077e-05, 'epoch': 2.39}


 80%|███████▉  | 97050/121875 [30:25:27<7:24:07,  1.07s/it]

{'loss': 0.575, 'learning_rate': 1.0184615384615386e-05, 'epoch': 2.39}


 80%|███████▉  | 97080/121875 [30:25:59<7:27:42,  1.08s/it]

{'loss': 0.5937, 'learning_rate': 1.0172307692307694e-05, 'epoch': 2.39}


 80%|███████▉  | 97110/121875 [30:26:32<7:23:41,  1.07s/it]

{'loss': 0.6122, 'learning_rate': 1.016e-05, 'epoch': 2.39}


 80%|███████▉  | 97140/121875 [30:27:04<7:30:44,  1.09s/it]

{'loss': 0.6036, 'learning_rate': 1.0147692307692308e-05, 'epoch': 2.39}


 80%|███████▉  | 97170/121875 [30:27:38<7:44:18,  1.13s/it]

{'loss': 0.6118, 'learning_rate': 1.0135384615384616e-05, 'epoch': 2.39}


 80%|███████▉  | 97200/121875 [30:28:12<7:42:46,  1.13s/it]

{'loss': 0.6285, 'learning_rate': 1.0123076923076923e-05, 'epoch': 2.39}


 80%|███████▉  | 97230/121875 [30:28:46<7:46:08,  1.13s/it]

{'loss': 0.6468, 'learning_rate': 1.011076923076923e-05, 'epoch': 2.39}


 80%|███████▉  | 97260/121875 [30:29:20<7:45:35,  1.13s/it]

{'loss': 0.6152, 'learning_rate': 1.0098461538461538e-05, 'epoch': 2.39}


 80%|███████▉  | 97290/121875 [30:29:53<7:43:17,  1.13s/it]

{'loss': 0.632, 'learning_rate': 1.0086153846153847e-05, 'epoch': 2.39}


 80%|███████▉  | 97320/121875 [30:30:27<7:43:06,  1.13s/it]

{'loss': 0.5495, 'learning_rate': 1.0073846153846155e-05, 'epoch': 2.4}


 80%|███████▉  | 97350/121875 [30:31:01<7:40:20,  1.13s/it]

{'loss': 0.5764, 'learning_rate': 1.0061538461538462e-05, 'epoch': 2.4}


 80%|███████▉  | 97380/121875 [30:31:35<7:41:17,  1.13s/it]

{'loss': 0.5956, 'learning_rate': 1.004923076923077e-05, 'epoch': 2.4}


 80%|███████▉  | 97410/121875 [30:32:09<7:38:57,  1.13s/it]

{'loss': 0.6273, 'learning_rate': 1.0036923076923077e-05, 'epoch': 2.4}


 80%|███████▉  | 97440/121875 [30:32:43<7:41:57,  1.13s/it]

{'loss': 0.6309, 'learning_rate': 1.0024615384615384e-05, 'epoch': 2.4}


 80%|███████▉  | 97470/121875 [30:33:17<7:40:29,  1.13s/it]

{'loss': 0.5659, 'learning_rate': 1.0012307692307693e-05, 'epoch': 2.4}


 80%|████████  | 97500/121875 [30:33:50<7:24:19,  1.09s/it]

{'loss': 0.6393, 'learning_rate': 1e-05, 'epoch': 2.4}


 80%|████████  | 97530/121875 [30:34:29<7:24:56,  1.10s/it] 

{'loss': 0.6291, 'learning_rate': 9.987692307692308e-06, 'epoch': 2.4}


 80%|████████  | 97560/121875 [30:35:02<7:39:46,  1.13s/it]

{'loss': 0.5972, 'learning_rate': 9.975384615384616e-06, 'epoch': 2.4}


 80%|████████  | 97590/121875 [30:35:36<7:36:48,  1.13s/it]

{'loss': 0.6784, 'learning_rate': 9.963076923076923e-06, 'epoch': 2.4}


 80%|████████  | 97620/121875 [30:36:10<7:34:35,  1.12s/it]

{'loss': 0.5462, 'learning_rate': 9.950769230769232e-06, 'epoch': 2.4}


 80%|████████  | 97650/121875 [30:36:44<7:35:25,  1.13s/it]

{'loss': 0.6236, 'learning_rate': 9.93846153846154e-06, 'epoch': 2.4}


 80%|████████  | 97680/121875 [30:37:18<7:37:23,  1.13s/it]

{'loss': 0.5932, 'learning_rate': 9.926153846153845e-06, 'epoch': 2.4}


 80%|████████  | 97710/121875 [30:37:51<7:34:41,  1.13s/it]

{'loss': 0.6218, 'learning_rate': 9.913846153846154e-06, 'epoch': 2.41}


 80%|████████  | 97740/121875 [30:38:25<7:36:54,  1.14s/it]

{'loss': 0.6291, 'learning_rate': 9.901538461538462e-06, 'epoch': 2.41}


 80%|████████  | 97770/121875 [30:38:59<7:35:31,  1.13s/it]

{'loss': 0.6097, 'learning_rate': 9.88923076923077e-06, 'epoch': 2.41}


 80%|████████  | 97800/121875 [30:39:33<7:29:23,  1.12s/it]

{'loss': 0.5895, 'learning_rate': 9.876923076923078e-06, 'epoch': 2.41}


 80%|████████  | 97830/121875 [30:40:06<7:12:13,  1.08s/it]

{'loss': 0.5898, 'learning_rate': 9.864615384615384e-06, 'epoch': 2.41}


 80%|████████  | 97860/121875 [30:40:38<7:12:32,  1.08s/it]

{'loss': 0.6275, 'learning_rate': 9.852307692307693e-06, 'epoch': 2.41}


 80%|████████  | 97890/121875 [30:41:11<7:16:09,  1.09s/it]

{'loss': 0.6125, 'learning_rate': 9.84e-06, 'epoch': 2.41}


 80%|████████  | 97920/121875 [30:41:43<7:13:27,  1.09s/it]

{'loss': 0.6151, 'learning_rate': 9.827692307692308e-06, 'epoch': 2.41}


 80%|████████  | 97950/121875 [30:42:16<7:14:26,  1.09s/it]

{'loss': 0.5677, 'learning_rate': 9.815384615384615e-06, 'epoch': 2.41}


 80%|████████  | 97980/121875 [30:42:48<7:10:45,  1.08s/it]

{'loss': 0.6063, 'learning_rate': 9.803076923076923e-06, 'epoch': 2.41}


 80%|████████  | 98010/121875 [30:43:27<7:42:36,  1.16s/it] 

{'loss': 0.5365, 'learning_rate': 9.79076923076923e-06, 'epoch': 2.41}


 80%|████████  | 98040/121875 [30:43:59<7:05:51,  1.07s/it]

{'loss': 0.5948, 'learning_rate': 9.77846153846154e-06, 'epoch': 2.41}


 80%|████████  | 98070/121875 [30:44:31<7:04:32,  1.07s/it]

{'loss': 0.5997, 'learning_rate': 9.766153846153847e-06, 'epoch': 2.41}


 80%|████████  | 98100/121875 [30:45:03<7:09:53,  1.08s/it]

{'loss': 0.6482, 'learning_rate': 9.753846153846154e-06, 'epoch': 2.41}


 81%|████████  | 98130/121875 [30:45:36<7:05:08,  1.07s/it]

{'loss': 0.6084, 'learning_rate': 9.741538461538462e-06, 'epoch': 2.42}


 81%|████████  | 98160/121875 [30:46:08<7:22:40,  1.12s/it]

{'loss': 0.627, 'learning_rate': 9.729230769230769e-06, 'epoch': 2.42}


 81%|████████  | 98190/121875 [30:46:42<7:26:33,  1.13s/it]

{'loss': 0.6002, 'learning_rate': 9.716923076923078e-06, 'epoch': 2.42}


 81%|████████  | 98220/121875 [30:47:16<7:24:26,  1.13s/it]

{'loss': 0.5527, 'learning_rate': 9.704615384615386e-06, 'epoch': 2.42}


 81%|████████  | 98250/121875 [30:47:50<7:20:53,  1.12s/it]

{'loss': 0.6038, 'learning_rate': 9.692307692307691e-06, 'epoch': 2.42}


 81%|████████  | 98280/121875 [30:48:24<7:25:06,  1.13s/it]

{'loss': 0.6489, 'learning_rate': 9.68e-06, 'epoch': 2.42}


 81%|████████  | 98310/121875 [30:48:58<7:26:51,  1.14s/it]

{'loss': 0.5585, 'learning_rate': 9.667692307692308e-06, 'epoch': 2.42}


 81%|████████  | 98340/121875 [30:49:32<7:23:15,  1.13s/it]

{'loss': 0.5993, 'learning_rate': 9.655384615384615e-06, 'epoch': 2.42}


 81%|████████  | 98370/121875 [30:50:05<7:24:03,  1.13s/it]

{'loss': 0.5941, 'learning_rate': 9.643076923076924e-06, 'epoch': 2.42}


 81%|████████  | 98400/121875 [30:50:39<7:19:46,  1.12s/it]

{'loss': 0.606, 'learning_rate': 9.63076923076923e-06, 'epoch': 2.42}


 81%|████████  | 98430/121875 [30:51:13<7:18:19,  1.12s/it]

{'loss': 0.6007, 'learning_rate': 9.61846153846154e-06, 'epoch': 2.42}


 81%|████████  | 98460/121875 [30:51:47<7:22:12,  1.13s/it]

{'loss': 0.6242, 'learning_rate': 9.606153846153847e-06, 'epoch': 2.42}


 81%|████████  | 98490/121875 [30:52:21<7:22:03,  1.13s/it]

{'loss': 0.6068, 'learning_rate': 9.593846153846154e-06, 'epoch': 2.42}


 81%|████████  | 98500/121875 [30:52:32<7:16:47,  1.12s/it]

RuntimeError: [enforce fail at inline_container.cc:424] . unexpected pos 267843008 vs 267842896

In [19]:
#small_test_dataset = tokenized_datasets["test"].shuffle(seed=64).select(range(100))
full_test_dataset = tokenized_datasets["test"].shuffle(seed=64)

The history saving thread hit an unexpected error (OperationalError('unable to open database file')).History will not be written to the database.


In [22]:
trainer.evaluate(full_test_dataset)

                                                           
 81%|████████  | 98500/121875 [31:37:51<7:16:47,  1.12s/it]

{'eval_loss': 0.7419986724853516, 'eval_accuracy': 0.68586, 'eval_runtime': 941.4775, 'eval_samples_per_second': 53.108, 'eval_steps_per_second': 6.639, 'epoch': 2.42}


{'eval_loss': 0.7419986724853516,
 'eval_accuracy': 0.68586,
 'eval_runtime': 941.4775,
 'eval_samples_per_second': 53.108,
 'eval_steps_per_second': 6.639,
 'epoch': 2.42}

### 保存模型和训练状态

- 使用 `trainer.save_model` 方法保存模型，后续可以通过 from_pretrained() 方法重新加载
- 使用 `trainer.save_state` 方法保存训练状态

In [23]:
trainer.save_model(model_dir)

In [21]:
trainer.save_state()

In [23]:
# trainer.model.save_pretrained("./")

## Homework: 使用完整的 YelpReviewFull 数据集训练，看 Acc 最高能到多少