## **T5-small Fine-tuning on ToTTo**

From : [JooYoung Song](https://github.com/Song-Joo-Young/ToTTo-Fine-tuning-in-colab/tree/main)

Code Reference :
* ToTTo : https://github.com/google-research-datasets/ToTTo
* Prompt-Tuning-on-ToTTo : https://github.com/ChainsmokersAI/Prompt-Tuning-on-ToTTo

In [1]:
# Google Drive Mount

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# Get Dataset

!wget https://storage.googleapis.com/totto-public/totto_data.zip
!unzip totto_data.zip

--2024-02-03 06:13:31--  https://storage.googleapis.com/totto-public/totto_data.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.20.207, 74.125.197.207, 74.125.135.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.20.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 187724372 (179M) [application/zip]
Saving to: ‘totto_data.zip’


2024-02-03 06:13:33 (93.6 MB/s) - ‘totto_data.zip’ saved [187724372/187724372]

Archive:  totto_data.zip
  inflating: totto_data/totto_dev_data.jsonl  
  inflating: totto_data/totto_train_data.jsonl  
  inflating: totto_data/unlabeled_totto_test_data.jsonl  


In [3]:
# 드라이브에 데이터셋 저장 추후 가중치도 저장할 폴더
# Copy Dataset to your Google Drive
import shutil
import os

source_folder = '/content/totto_data'
destination_folder = '/content/drive/MyDrive/ToTTo_data'

if os.path.exists(destination_folder):
    shutil.rmtree(destination_folder)

shutil.copytree(source_folder, destination_folder)

'/content/drive/MyDrive/ToTTo_data'

### **1. Preprocessing**

In [4]:
!pip install transformers datasets sentencepiece

Collecting datasets
  Downloading datasets-2.16.1-py3-none-any.whl (507 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.1/507.1 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.8,>=0.3.0 (from datasets)
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
INFO: pip is looking at multiple versions of multiprocess to determine which version is compatible with other requirements. This could take a while.
  Downloading multiprocess-0.70.15-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
Installing collected pac

In [5]:
# Load Train Set
# with open('/content/totto_data/totto_train_data.jsonl', 'r') as f:
with open('/content/drive/MyDrive/ToTTo_data/totto_train_data.jsonl', 'r') as f:
    data_train=f.read().splitlines()
    f.close()

# Number of Train Data
len(data_train)

120761

In [6]:
import json

# Sample Data
data_sample=json.loads(data_train[-1])

# Key-Value Set
for key, value in data_sample.items():
    # if key=='table': continue

    print('→', key, '\n \t ', value)

→ table 
 	  [[{'value': 'Rank', 'is_header': True, 'column_span': 1, 'row_span': 1}, {'value': 'Lane', 'is_header': True, 'column_span': 1, 'row_span': 1}, {'value': 'Name', 'is_header': True, 'column_span': 1, 'row_span': 1}, {'value': 'Nationality', 'is_header': True, 'column_span': 1, 'row_span': 1}, {'value': 'Time', 'is_header': True, 'column_span': 1, 'row_span': 1}, {'value': 'Notes', 'is_header': True, 'column_span': 1, 'row_span': 1}], [{'value': '', 'is_header': False, 'column_span': 1, 'row_span': 1}, {'value': '4', 'is_header': False, 'column_span': 1, 'row_span': 1}, {'value': 'Matt Grevers', 'is_header': False, 'column_span': 1, 'row_span': 1}, {'value': 'United States', 'is_header': False, 'column_span': 1, 'row_span': 1}, {'value': '52.16', 'is_header': False, 'column_span': 1, 'row_span': 1}, {'value': 'OR', 'is_header': False, 'column_span': 1, 'row_span': 1}], [{'value': '', 'is_header': False, 'column_span': 1, 'row_span': 1}, {'value': '2', 'is_header': False, 'co

In [7]:
# Google's Official Preprocess Codes
# https://github.com/google-research/language/blob/master/language/totto/baseline_preprocessing/preprocess_utils.py

import copy

def _add_adjusted_col_offsets(table):
  """Add adjusted column offsets to take into account multi-column cells."""
  adjusted_table = []
  for row in table:
    real_col_index = 0
    adjusted_row = []
    for cell in row:
      adjusted_cell = copy.deepcopy(cell)
      adjusted_cell["adjusted_col_start"] = real_col_index
      adjusted_cell["adjusted_col_end"] = (
          adjusted_cell["adjusted_col_start"] + adjusted_cell["column_span"])
      real_col_index += adjusted_cell["column_span"]
      adjusted_row.append(adjusted_cell)
    adjusted_table.append(adjusted_row)
  return adjusted_table


def _get_heuristic_row_headers(adjusted_table, row_index, col_index):
  """Heuristic to find row headers."""
  row_headers = []
  row = adjusted_table[row_index]
  for i in range(0, col_index):
    if row[i]["is_header"]:
      row_headers.append(row[i])
  return row_headers


def _get_heuristic_col_headers(adjusted_table, row_index, col_index):
  """Heuristic to find column headers."""
  adjusted_cell = adjusted_table[row_index][col_index]
  adjusted_col_start = adjusted_cell["adjusted_col_start"]
  adjusted_col_end = adjusted_cell["adjusted_col_end"]
  col_headers = []
  for r in range(0, row_index):
    row = adjusted_table[r]
    for cell in row:
      if (cell["adjusted_col_start"] < adjusted_col_end and
          cell["adjusted_col_end"] > adjusted_col_start):
        if cell["is_header"]:
          col_headers.append(cell)

  return col_headers


def get_highlighted_subtable(table, cell_indices, with_heuristic_headers=False):
  """Extract out the highlighted part of a table."""
  highlighted_table = []

  adjusted_table = _add_adjusted_col_offsets(table)

  for (row_index, col_index) in cell_indices:
    cell = table[row_index][col_index]
    if with_heuristic_headers:
      row_headers = _get_heuristic_row_headers(adjusted_table, row_index,
                                               col_index)
      col_headers = _get_heuristic_col_headers(adjusted_table, row_index,
                                               col_index)
    else:
      row_headers = []
      col_headers = []

    highlighted_cell = {
        "cell": cell,
        "row_headers": row_headers,
        "col_headers": col_headers
    }
    highlighted_table.append(highlighted_cell)

  return highlighted_table


def linearize_full_table(table, cell_indices, table_page_title,
                         table_section_title):
  """Linearize full table with localized headers and return a string."""
  table_str = ""
  if table_page_title:
    table_str += "<page_title> " + table_page_title + " </page_title> "
  if table_section_title:
    table_str += "<section_title> " + table_section_title + " </section_title> "

  table_str += "<table> "
  adjusted_table = _add_adjusted_col_offsets(table)
  for r_index, row in enumerate(table):
    row_str = "<row> "
    for c_index, col in enumerate(row):

      row_headers = _get_heuristic_row_headers(adjusted_table, r_index, c_index)
      col_headers = _get_heuristic_col_headers(adjusted_table, r_index, c_index)

      # Distinguish between highlighted and non-highlighted cells.
      if [r_index, c_index] in cell_indices:
        start_cell_marker = "<highlighted_cell> "
        end_cell_marker = "</highlighted_cell> "
      else:
        start_cell_marker = "<cell> "
        end_cell_marker = "</cell> "

      # The value of the cell.
      item_str = start_cell_marker + col["value"] + " "

      # All the column headers associated with this cell.
      for col_header in col_headers:
        item_str += "<col_header> " + col_header["value"] + " </col_header> "

      # All the row headers associated with this cell.
      for row_header in row_headers:
        item_str += "<row_header> " + row_header["value"] + " </row_header> "

      item_str += end_cell_marker
      row_str += item_str

    row_str += "</row> "
    table_str += row_str

  table_str += "</table>"
  if cell_indices:
    assert "<highlighted_cell>" in table_str
  return table_str


def linearize_subtable(subtable, table_page_title, table_section_title):
  """Linearize the highlighted subtable and return a string of its contents."""
  table_str = ""
  if table_page_title:
    table_str += "<page_title> " + table_page_title + " </page_title> "
  if table_section_title:
    table_str += "<section_title> " + table_section_title + " </section_title> "
  table_str += "<table> "

  for item in subtable:
    cell = item["cell"]
    row_headers = item["row_headers"]
    col_headers = item["col_headers"]

    # The value of the cell.
    item_str = "<cell> " + cell["value"] + " "

    # All the column headers associated with this cell.
    for col_header in col_headers:
      item_str += "<col_header> " + col_header["value"] + " </col_header> "

    # All the row headers associated with this cell.
    for row_header in row_headers:
      item_str += "<row_header> " + row_header["value"] + " </row_header> "

    item_str += "</cell> "
    table_str += item_str

  table_str += "</table>"
  return table_str

In [8]:
# from preprocess_utils import get_highlighted_subtable, linearize_subtable

print('→', 'Highlighted Cells')
for (index_row, index_col) in data_sample['highlighted_cells']:
    print(data_sample['table'][index_row][index_col])

print('\n→', 'Linearized (Preprocessed) Cells')
subtable=get_highlighted_subtable(table=data_sample['table'], cell_indices=data_sample['highlighted_cells'], with_heuristic_headers=True)
cells_linearized=linearize_subtable(
    subtable=subtable,
    table_page_title=data_sample['table_page_title'],
    table_section_title=data_sample['table_section_title']
)
print(cells_linearized)

print('\n→', 'Final (Label) Sentence')
for sentence in data_sample['sentence_annotations']:
    print(sentence['final_sentence'])

→ Highlighted Cells
{'value': '4', 'is_header': False, 'column_span': 1, 'row_span': 1}
{'value': 'Camille Lacourt', 'is_header': False, 'column_span': 1, 'row_span': 1}
{'value': '53.08', 'is_header': False, 'column_span': 1, 'row_span': 1}

→ Linearized (Preprocessed) Cells
<page_title> Swimming at the 2012 Summer Olympics – Men's 100 metre backstroke </page_title> <section_title> Final </section_title> <table> <cell> 4 <col_header> Rank </col_header> </cell> <cell> Camille Lacourt <col_header> Name </col_header> </cell> <cell> 53.08 <col_header> Time </col_header> </cell> </table>

→ Final (Label) Sentence
Lacourt was dropped to a fourth-place time in 53.08.


In [9]:
# Prepare for Training
from transformers import T5Tokenizer

# T5 Tokenizer
tokenizer=T5Tokenizer.from_pretrained('t5-small')

# Vocab Size
len(tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


32100

In [10]:
# Add Special Tokens: Table Tags
tokenizer.add_special_tokens({
    'additional_special_tokens': [
        '<page_title>',
        '</page_title>',
        '<section_title>',
        '</section_title>',
        '<table>',
        '</table>',
        '<cell>',
        '</cell>',
        '<col_header>',
        '</col_header>',
        '<row_header>',
        '</row_header>'
    ]
})
# When Training, Resize PLM's Embedding Layer
# model.resize_token_embeddings(len(tokenizer))

# Vocab Size
len(tokenizer)

32112

In [11]:
# Tokenize Linearized Cells
print(tokenizer.tokenize(cells_linearized))

['<page_title>', '▁Swimming', '▁at', '▁the', '▁2012', '▁Summer', '▁Olympics', '▁', '–', '▁Men', "'", 's', '▁100', '▁', 'metre', '▁back', 'stroke', '</page_title>', '<section_title>', '▁Final', '</section_title>', '<table>', '<cell>', '▁4', '<col_header>', '▁', 'Rank', '</col_header>', '</cell>', '<cell>', '▁Camill', 'e', '▁La', 'court', '<col_header>', '▁Name', '</col_header>', '</cell>', '<cell>', '▁53', '.', '08', '<col_header>', '▁Time', '</col_header>', '</cell>', '</table>']


### **2. Finetuning (t5-base)**

In [12]:
import json

import torch
from torch.utils.data import Dataset, DataLoader
from torch.utils.tensorboard import SummaryWriter

from transformers import T5Tokenizer, T5ForConditionalGeneration, AdamW, get_linear_schedule_with_warmup

# Google's Official Preprocess Codes
# https://github.com/google-research/language/blob/master/language/totto/baseline_preprocessing/preprocess_utils.py
# from preprocess_utils import get_highlighted_subtable, linearize_subtable

In [14]:
# Train Config
device=torch.device('cuda:0')
lr=1e-4
batch_size=16 # 3 for 't5-large' and make 'accumulation_steps' larger
accumulation_steps=2
epochs=10

In [15]:
# Pre-Trained T5 Tokenizer
tokenizer=T5Tokenizer.from_pretrained('t5-base')
# Add Special Tokens: Table Tags
tokenizer.add_special_tokens({
    'additional_special_tokens': [
        '<page_title>',
        '</page_title>',
        '<section_title>',
        '</section_title>',
        '<table>',
        '</table>',
        '<cell>',
        '</cell>',
        '<col_header>',
        '</col_header>',
        '<row_header>',
        '</row_header>'
    ]
})

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


12

In [16]:
class ToTToDataset(Dataset):
    def __init__(self, path_data, tokenizer):
        #
        self.data=[]
        self.label=[]

        # Load Dataset
        with open(path_data, 'r') as f:
            dataset=f.read().splitlines()
            f.close()

        for _data in dataset:
            data=json.loads(_data)

            # Preprocess
            subtable=get_highlighted_subtable(table=data['table'], cell_indices=data['highlighted_cells'], with_heuristic_headers=True)
            cells_linearized=linearize_subtable(
                subtable=subtable,
                table_page_title=data['table_page_title'],
                table_section_title=data['table_section_title']
            )

            # Encode
            encoded=tokenizer.encode(cells_linearized)
            if len(encoded)>512:
                # Truncate
                encoded=encoded[:511]+[tokenizer.eos_token_id]
            self.data.append(encoded)
            self.label.append(tokenizer.encode(data['sentence_annotations'][0]['final_sentence']))

        print(len(self.data), 'datas')
        print(len(self.label), 'labels')

    def __getitem__(self, idx):
        return self.data[idx], self.label[idx]

    def __len__(self):
        return len(self.data)

In [17]:
def collate_fn(batch):
    """
    Same Sequence Length on Same Batch
    """
    max_len_data=0
    max_len_label=0
    for data, label in batch:
        if len(data)>max_len_data: max_len_data=len(data)
        if len(label)>max_len_label: max_len_label=len(label)

    datas=[]
    attn_masks=[]
    labels=[]
    for data, label in batch:
        data.extend([tokenizer.pad_token_id]*(max_len_data-len(data)))
        datas.append(data)

        attn_mask=[int(e!=tokenizer.pad_token_id) for e in data]
        attn_masks.append(attn_mask)

        label.extend([-100]*(max_len_label-len(label)))
        labels.append(label)

    return torch.tensor(datas), torch.tensor(attn_masks), torch.tensor(labels)

In [18]:
# Pre-Trained T5 Model
model=T5ForConditionalGeneration.from_pretrained('t5-small')
# Resize PLM's Embedding Layer
model.resize_token_embeddings(len(tokenizer))

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Embedding(32112, 512)

In [19]:
print(model)

T5ForConditionalGeneration(
  (shared): Embedding(32112, 512)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32112, 512)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=512, out_features=512, bias=False)
              (k): Linear(in_features=512, out_features=512, bias=False)
              (v): Linear(in_features=512, out_features=512, bias=False)
              (o): Linear(in_features=512, out_features=512, bias=False)
              (relative_attention_bias): Embedding(32, 8)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseActDense(
              (wi): Linear(in_features=512, out_features=2048, bias=False)
              (wo): Linear(in_features=2048, out_features=512, bias=False)
              (dropout): Drop

In [20]:
# dataset_train=ToTToDataset(path_data='/content/totto_data/totto_train_data.jsonl', tokenizer=tokenizer)
dataset_train=ToTToDataset(path_data='/content/drive/MyDrive/ToTTo_data/totto_train_data.jsonl', tokenizer=tokenizer)
dataloader_train=DataLoader(dataset_train, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)

Token indices sequence length is longer than the specified maximum sequence length for this model (578 > 512). Running this sequence through the model will result in indexing errors


120761 datas
120761 labels


In [21]:
# Optim, Scheduler
optimizer=AdamW(model.parameters(), lr=lr)
scheduler=get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=1000,
    num_training_steps=int(epochs*len(dataset_train)/(accumulation_steps*batch_size))
)



In [22]:
from tqdm import tqdm

step_global=0

for epoch in range(epochs):
    # Train Phase
    model.train()
    model.to(device)

    loss_train = 0
    optimizer.zero_grad()

    tqdm_dataloader_train = tqdm(dataloader_train, desc=f'Epoch {epoch + 1}')

    for step, (data, attn_mask, label) in enumerate(tqdm_dataloader_train):
        data = data.to(device)
        attn_mask = attn_mask.to(device)
        label = label.to(device)

        outputs = model(input_ids=data, attention_mask=attn_mask, labels=label)

        loss = outputs[0] / accumulation_steps
        loss.backward()

        loss_train += loss.item()

        if (step + 1) % accumulation_steps == 0:
            step_global += 1

            # Console
            if step_global % 1000 == 0:
                print(f'\n Epoch {epoch + 1}  Step {step_global} Train loss {loss_train:.4f}')
            # Set Loss to 0
            loss_train = 0

            optimizer.step()
            scheduler.step()

            optimizer.zero_grad()

    # Save Model
    model.to(torch.device('cpu'))
    torch.save(model.state_dict(), f'/content/drive/MyDrive/ToTTo_T5-small/model/T5-base_Fine-Tuning_lr{lr}_batch{int(accumulation_steps*batch_size)}_epoch{epoch+1}.pth')


Epoch 1:  26%|██▋       | 2000/7548 [08:02<27:39,  3.34it/s]


 Epoch 1  Step 1000 Train loss 1.5196


Epoch 1:  53%|█████▎    | 4001/7548 [16:05<12:18,  4.80it/s]


 Epoch 1  Step 2000 Train loss 1.6579


Epoch 1:  79%|███████▉  | 6000/7548 [24:06<04:43,  5.47it/s]


 Epoch 1  Step 3000 Train loss 1.3284


Epoch 1: 100%|██████████| 7548/7548 [30:24<00:00,  4.14it/s]
Epoch 2:   6%|▌         | 452/7548 [03:56<1:02:57,  1.88it/s]


 Epoch 2  Step 4000 Train loss 1.3600


Epoch 2:  32%|███▏      | 2452/7548 [21:23<46:24,  1.83it/s]


 Epoch 2  Step 5000 Train loss 1.7001


Epoch 2:  59%|█████▉    | 4452/7548 [38:54<27:22,  1.88it/s]


 Epoch 2  Step 6000 Train loss 1.1717


Epoch 2:  85%|████████▌ | 6452/7548 [56:17<09:07,  2.00it/s]


 Epoch 2  Step 7000 Train loss 1.1923


Epoch 2: 100%|██████████| 7548/7548 [1:05:53<00:00,  1.91it/s]
Epoch 3:  12%|█▏        | 904/7548 [08:39<1:05:55,  1.68it/s]


 Epoch 3  Step 8000 Train loss 1.1468


Epoch 3:  38%|███▊      | 2904/7548 [27:49<44:56,  1.72it/s]


 Epoch 3  Step 9000 Train loss 1.3230


Epoch 3:  65%|██████▍   | 4904/7548 [46:59<25:31,  1.73it/s]


 Epoch 3  Step 10000 Train loss 1.2396


Epoch 3:  91%|█████████▏| 6904/7548 [1:06:09<06:14,  1.72it/s]


 Epoch 3  Step 11000 Train loss 1.3539


Epoch 3: 100%|██████████| 7548/7548 [1:12:20<00:00,  1.74it/s]
Epoch 4:  18%|█▊        | 1356/7548 [13:47<1:02:58,  1.64it/s]


 Epoch 4  Step 12000 Train loss 1.4253


Epoch 4:  44%|████▍     | 3356/7548 [34:07<44:19,  1.58it/s]


 Epoch 4  Step 13000 Train loss 0.9973


Epoch 4:  71%|███████   | 5356/7548 [54:26<22:01,  1.66it/s]


 Epoch 4  Step 14000 Train loss 1.3200


Epoch 4:  97%|█████████▋| 7356/7548 [1:14:46<01:59,  1.61it/s]


 Epoch 4  Step 15000 Train loss 1.1626


Epoch 4: 100%|██████████| 7548/7548 [1:16:42<00:00,  1.64it/s]
Epoch 5:  24%|██▍       | 1808/7548 [19:04<1:01:02,  1.57it/s]


 Epoch 5  Step 16000 Train loss 1.0635


Epoch 5:  50%|█████     | 3808/7548 [40:10<38:58,  1.60it/s]


 Epoch 5  Step 17000 Train loss 1.2313


Epoch 5:  77%|███████▋  | 5808/7548 [1:01:15<18:05,  1.60it/s]


 Epoch 5  Step 18000 Train loss 1.1560


Epoch 5: 100%|██████████| 7548/7548 [1:19:36<00:00,  1.58it/s]
Epoch 6:   3%|▎         | 260/7548 [02:44<1:15:56,  1.60it/s]


 Epoch 6  Step 19000 Train loss 1.1139


Epoch 6:  30%|██▉       | 2260/7548 [23:50<55:08,  1.60it/s]


 Epoch 6  Step 20000 Train loss 1.2756


Epoch 6:  56%|█████▋    | 4260/7548 [44:56<35:21,  1.55it/s]


 Epoch 6  Step 21000 Train loss 1.1645


Epoch 6:  83%|████████▎ | 6260/7548 [1:06:02<13:40,  1.57it/s]


 Epoch 6  Step 22000 Train loss 0.9513


Epoch 6: 100%|██████████| 7548/7548 [1:19:36<00:00,  1.58it/s]
Epoch 7:   9%|▉         | 712/7548 [07:30<1:12:44,  1.57it/s]


 Epoch 7  Step 23000 Train loss 1.1066


Epoch 7:  36%|███▌      | 2712/7548 [28:36<50:53,  1.58it/s]


 Epoch 7  Step 24000 Train loss 1.1399


Epoch 7:  62%|██████▏   | 4712/7548 [49:42<29:31,  1.60it/s]


 Epoch 7  Step 25000 Train loss 1.2580


Epoch 7:  89%|████████▉ | 6712/7548 [1:10:47<08:53,  1.57it/s]


 Epoch 7  Step 26000 Train loss 1.2372


Epoch 7: 100%|██████████| 7548/7548 [1:19:36<00:00,  1.58it/s]
Epoch 8:  15%|█▌        | 1164/7548 [12:16<1:07:39,  1.57it/s]


 Epoch 8  Step 27000 Train loss 1.0903


Epoch 8:  42%|████▏     | 3164/7548 [33:22<46:44,  1.56it/s]


 Epoch 8  Step 28000 Train loss 1.1457


Epoch 8:  68%|██████▊   | 5164/7548 [54:27<25:24,  1.56it/s]


 Epoch 8  Step 29000 Train loss 0.9260


Epoch 8:  95%|█████████▍| 7164/7548 [1:15:33<04:05,  1.57it/s]


 Epoch 8  Step 30000 Train loss 1.0127


Epoch 8: 100%|██████████| 7548/7548 [1:19:36<00:00,  1.58it/s]
Epoch 9:  21%|██▏       | 1616/7548 [17:02<1:03:15,  1.56it/s]


 Epoch 9  Step 31000 Train loss 1.1318


Epoch 9:  48%|████▊     | 3616/7548 [38:07<41:49,  1.57it/s]


 Epoch 9  Step 32000 Train loss 1.0146


Epoch 9:  74%|███████▍  | 5616/7548 [59:13<20:36,  1.56it/s]


 Epoch 9  Step 33000 Train loss 1.0781


Epoch 9: 100%|██████████| 7548/7548 [1:19:36<00:00,  1.58it/s]
Epoch 10:   1%|          | 68/7548 [00:42<1:17:57,  1.60it/s]


 Epoch 10  Step 34000 Train loss 1.0097


Epoch 10:  27%|██▋       | 2068/7548 [21:49<58:15,  1.57it/s]


 Epoch 10  Step 35000 Train loss 0.9274


Epoch 10:  54%|█████▍    | 4068/7548 [42:54<37:06,  1.56it/s]


 Epoch 10  Step 36000 Train loss 1.0299


Epoch 10:  80%|████████  | 6068/7548 [1:03:59<15:24,  1.60it/s]


 Epoch 10  Step 37000 Train loss 1.1739


Epoch 10: 100%|██████████| 7548/7548 [1:19:35<00:00,  1.58it/s]
