# **Homework 7 - Bert (Question Answering)**

If you have any questions, feel free to email us at mlta-2022-spring@googlegroups.com



Slide:    [Link](https://docs.google.com/presentation/d/1H5ZONrb2LMOCixLY7D5_5-7LkIaXO6AGEaV2mRdTOMY/edit?usp=sharing)　Kaggle: [Link](https://www.kaggle.com/c/ml2022spring-hw7)　Data: [Link](https://drive.google.com/uc?id=1AVgZvy3VFeg0fX-6WQJMHPVrx3A-M1kb)




## Task description
- Chinese Extractive Question Answering
  - Input: Paragraph + Question
  - Output: Answer

- Objective: Learn how to fine tune a pretrained model on downstream task using transformers

- Todo
    - Fine tune a pretrained chinese BERT model
    - Change hyperparameters (e.g. doc_stride)
    - Apply linear learning rate decay
    - Try other pretrained models
    - Improve preprocessing
    - Improve postprocessing
- Training tips
    - Automatic mixed precision
    - Gradient accumulation
    - Ensemble

- Estimated training time (tesla t4 with automatic mixed precision enabled)
    - Simple: 8mins
    - Medium: 8mins
    - Strong: 25mins
    - Boss: 2.5hrs
  

## Download Dataset

In [1]:
# # Download link 1
# !gdown --id '1AVgZvy3VFeg0fX-6WQJMHPVrx3A-M1kb' --output hw7_data.zip

# # Download Link 2 (if the above link fails) 
# # !gdown --id '1qwjbRjq481lHsnTrrF4OjKQnxzgoLEFR' --output hw7_data.zip

# # Download Link 3 (if the above link fails) 
# # !gdown --id '1QXuWjNRZH6DscSd6QcRER0cnxmpZvijn' --output hw7_data.zip

# !unzip -o hw7_data.zip

# For this HW, K80 < P4 < T4 < P100 <= T4(fp16) < V100
!nvidia-smi

Sat Apr 23 09:59:36 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  On   | 00000000:65:00.0  On |                  N/A |
|  0%   40C    P0    66W / 275W |    230MiB / 11177MiB |     13%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Install transformers

Documentation for the toolkit:　https://huggingface.co/transformers/

In [2]:
# You are allowed to change version of transformers or use other toolkits
# !pip install transformers==4.5.0

## Import Packages

In [3]:
import json
import numpy as np
import random
import torch
from torch.utils.data import DataLoader, Dataset 
from transformers import AdamW, BertForQuestionAnswering, BertTokenizerFast
from tqdm.auto import tqdm
import transformers


device = "cuda" if torch.cuda.is_available() else "cpu"


# Fix random seed for reproducibility
def same_seeds(seed):
	  torch.manual_seed(seed)
	  if torch.cuda.is_available():
		    torch.cuda.manual_seed(seed)
		    torch.cuda.manual_seed_all(seed)
	  np.random.seed(seed)
	  random.seed(seed)
	  torch.backends.cudnn.benchmark = False
	  torch.backends.cudnn.deterministic = True


same_seeds(0)


In [4]:
# Change "fp16_training" to True to support automatic mixed precision training (fp16)	
fp16_training = False

if fp16_training:
    !pip install accelerate==0.2.0
    from accelerate import Accelerator
    accelerator = Accelerator(fp16=True)
    device = accelerator.device

# Documentation for the toolkit:  https://huggingface.co/docs/accelerate/

## Load Model and Tokenizer




 

In [5]:
# model = BertForQuestionAnswering.from_pretrained("uer/roberta-base-chinese-extractive-qa").to(device)
# tokenizer = BertTokenizerFast.from_pretrained("uer/roberta-base-chinese-extractive-qa")

model = BertForQuestionAnswering.from_pretrained("hfl/chinese-macbert-large").to(device)
tokenizer = BertTokenizerFast.from_pretrained("hfl/chinese-macbert-large")

# You can safely ignore the warning message (it pops up because new prediction heads for QA are initialized randomly)

Some weights of the model checkpoint at hfl/chinese-macbert-large were not used when initializing BertForQuestionAnswering: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForQuestionAnswering were not initialized from the

## Read Data

- Training set: 31690 QA pairs
- Dev set: 4131  QA pairs
- Test set: 4957  QA pairs

- {train/dev/test}_questions:	
  - List of dicts with the following keys:
   - id (int)
   - paragraph_id (int)
   - question_text (string)
   - answer_text (string)
   - answer_start (int)
   - answer_end (int)
- {train/dev/test}_paragraphs: 
  - List of strings
  - paragraph_ids in questions correspond to indexs in paragraphs
  - A paragraph may be used by several questions 

In [6]:
def read_data(file):
    with open(file, 'r', encoding="utf-8") as reader:
        data = json.load(reader)
    return data["questions"], data["paragraphs"]

train_questions, train_paragraphs = read_data("hw7_train.json")
dev_questions, dev_paragraphs = read_data("hw7_dev.json")
test_questions, test_paragraphs = read_data("hw7_test.json")

## Tokenize Data

In [7]:
# Tokenize questions and paragraphs separately
# 「add_special_tokens」 is set to False since special tokens will be added when tokenized questions and paragraphs are combined in datset __getitem__ 

train_questions_tokenized = tokenizer([train_question["question_text"] for train_question in train_questions], add_special_tokens=False)
dev_questions_tokenized = tokenizer([dev_question["question_text"] for dev_question in dev_questions], add_special_tokens=False)
test_questions_tokenized = tokenizer([test_question["question_text"] for test_question in test_questions], add_special_tokens=False) 

train_paragraphs_tokenized = tokenizer(train_paragraphs, add_special_tokens=False)
dev_paragraphs_tokenized = tokenizer(dev_paragraphs, add_special_tokens=False)
test_paragraphs_tokenized = tokenizer(test_paragraphs, add_special_tokens=False)

# You can safely ignore the warning message as tokenized sequences will be futher processed in datset __getitem__ before passing to model

## Dataset and Dataloader

In [8]:
class QA_Dataset(Dataset):
    def __init__(self, split, questions, tokenized_questions, tokenized_paragraphs):
        self.split = split
        self.questions = questions
        self.tokenized_questions = tokenized_questions
        self.tokenized_paragraphs = tokenized_paragraphs
        self.max_question_len = 40
        self.max_paragraph_len = 200
        # self.max_paragraph_len = 150

        
        ##### TODO: Change value of doc_stride #####
        self.doc_stride = 50
        # self.doc_stride = 150

        # Input sequence length = [CLS] + question + [SEP] + paragraph + [SEP]
        self.max_seq_len = 1 + self.max_question_len + 1 + self.max_paragraph_len + 1

    def __len__(self):
        return len(self.questions)

    def __getitem__(self, idx):
        question = self.questions[idx]
        tokenized_question = self.tokenized_questions[idx]
        tokenized_paragraph = self.tokenized_paragraphs[question["paragraph_id"]]

        ##### TODO: Preprocessing #####
        # Hint: How to prevent model from learning something it should not learn

        if self.split == "train":
            # Convert answer's start/end positions in paragraph_text to start/end positions in tokenized_paragraph  
            answer_start_token = tokenized_paragraph.char_to_token(question["answer_start"])
            answer_end_token = tokenized_paragraph.char_to_token(question["answer_end"])

            # A single window is obtained by slicing the portion of paragraph containing the answer
            # mid = int((answer_start_token + answer_end_token) // (2+random.uniform(-1, 1)))
            mid = int((answer_start_token + answer_end_token) // (2 + 0.25*np.random.standard_normal()))

            paragraph_start = max(0, min(mid - self.max_paragraph_len // 2, len(tokenized_paragraph) - self.max_paragraph_len))
            paragraph_end = paragraph_start + self.max_paragraph_len
            
            # Slice question/paragraph and add special tokens (101: CLS, 102: SEP)
            input_ids_question = [101] + tokenized_question.ids[:self.max_question_len] + [102] 
            input_ids_paragraph = tokenized_paragraph.ids[paragraph_start : paragraph_end] + [102]		
            
            # Convert answer's start/end positions in tokenized_paragraph to start/end positions in the window  
            answer_start_token += len(input_ids_question) - paragraph_start
            answer_end_token += len(input_ids_question) - paragraph_start
            
            # Pad sequence and obtain inputs to model 
            input_ids, token_type_ids, attention_mask = self.padding(input_ids_question, input_ids_paragraph)
            return torch.tensor(input_ids), torch.tensor(token_type_ids), torch.tensor(attention_mask), answer_start_token, answer_end_token

        # Validation/Testing
        else:
            input_ids_list, token_type_ids_list, attention_mask_list = [], [], []
            
            # Paragraph is split into several windows, each with start positions separated by step "doc_stride"
            for i in range(0, len(tokenized_paragraph), self.doc_stride):
                
                # Slice question/paragraph and add special tokens (101: CLS, 102: SEP)
                input_ids_question = [101] + tokenized_question.ids[:self.max_question_len] + [102]
                input_ids_paragraph = tokenized_paragraph.ids[i : i + self.max_paragraph_len] + [102]
                
                # Pad sequence and obtain inputs to model
                input_ids, token_type_ids, attention_mask = self.padding(input_ids_question, input_ids_paragraph)
                
                input_ids_list.append(input_ids)
                token_type_ids_list.append(token_type_ids)
                attention_mask_list.append(attention_mask)
            
            return torch.tensor(input_ids_list), torch.tensor(token_type_ids_list), torch.tensor(attention_mask_list)

    def padding(self, input_ids_question, input_ids_paragraph):
        # Pad zeros if sequence length is shorter than max_seq_len
        padding_len = self.max_seq_len - len(input_ids_question) - len(input_ids_paragraph)
        # Indices of input sequence tokens in the vocabulary
        input_ids = input_ids_question + input_ids_paragraph + [0] * padding_len
        # Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
        token_type_ids = [0] * len(input_ids_question) + [1] * len(input_ids_paragraph) + [0] * padding_len
        # Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
        attention_mask = [1] * (len(input_ids_question) + len(input_ids_paragraph)) + [0] * padding_len
        
        return input_ids, token_type_ids, attention_mask

train_set = QA_Dataset("train", train_questions, train_questions_tokenized, train_paragraphs_tokenized)
dev_set = QA_Dataset("dev", dev_questions, dev_questions_tokenized, dev_paragraphs_tokenized)
test_set = QA_Dataset("test", test_questions, test_questions_tokenized, test_paragraphs_tokenized)

train_batch_size = 4

# Note: Do NOT change batch size of dev_loader / test_loader !
# Although batch size=1, it is actually a batch consisting of several windows from the same QA pair
train_loader = DataLoader(train_set, batch_size=train_batch_size, shuffle=True, pin_memory=True)
dev_loader = DataLoader(dev_set, batch_size=1, shuffle=False, pin_memory=True)
test_loader = DataLoader(test_set, batch_size=1, shuffle=False, pin_memory=True)

## Function for Evaluation

In [9]:
def index_tokenize(tokens, start, end):
    char_count, new_start, new_end = 0, 512, 512
    start_flag = False  
    # print("start:", start)
    # print("end:", end)
    for i, token in enumerate(tokens):
        # print("i:",i, " token:", token)
        if token == '[UNK]' or token == '[CLS]' or token == '[SEP]':
            if i == start:
                new_start = char_count
                # print("new_start1:", new_start)

            if i == end:
                new_end = char_count
                return new_start, new_end
                
            char_count += 1
            # print("char_count1:",char_count)
        else:
            for ch in token:
                if i == start and start_flag == False:
                    new_start = char_count
                    start_flag = True
                    # print("new_start2:", new_start)

                if i == end:
                    new_end = char_count
                    return new_start, new_end

                if ch != '#':
                    char_count += 1
                # print("char_count2:",char_count)

    return start, end              

In [10]:
def evaluate(data, output, doc_stride=150, paragraph=None, paragraph_tokenized=None):
    ##### TODO: Postprocessing #####
    # There is a bug and room for improvement in postprocessing 
    # Hint: Open your prediction file to see what is wrong 
    
    answer = ''
    max_prob = float('-inf')
    num_of_windows = data[0].shape[1]
    entire_start_index = 0
    entire_end_index = 0

    # print(data)
    for k in range(num_of_windows):
        # print('window', k)
        # Obtain answer by choosing the most probable start position / end position
        mask = data[1][0][k].bool() & data[2][0][k].bool() # get document, token_type_ids & attention_mask
        mask = mask.to(device)

        masked_output_start = torch.masked_select(output.start_logits[k], mask)[:-1] # last one is [SEP]
        start_prob, start_index = torch.max(masked_output_start, dim=0)

        masked_output_start = torch.masked_select(output.end_logits[k], mask)[:-1] # last one is [SEP]
        end_prob, end_index = torch.max(masked_output_start, dim=0)
        
        # Probability of answer is calculated as sum of start_prob and end_prob
        prob = start_prob + end_prob
        masked_data = torch.masked_select(data[0][0][k].to(device), mask)[:-1]
        
        # Replace answer if calculated probability is larger than previous windows
        if (prob > max_prob) and (end_index - start_index <= 20) and (end_index > start_index):
            max_prob = prob
            entire_start_index = start_index.item() + doc_stride * k
            entire_end_index = end_index.item() + doc_stride * k
            # print("entire_start_index", entire_start_index)
            # print("entire_end_index", entire_end_index)
            # Convert tokens to chars (e.g. [1920, 7032] --> "大 金")
            answer = tokenizer.decode(masked_data[start_index : end_index + 1])
    # 若 [UNK] 在 prediction，使用原始的 paragraph
    if '[UNK]' in answer:
        print('found [UNK] in prediction.')
        print('original pred:', answer)

        new_start, new_end = index_tokenize(tokens=paragraph_tokenized, start=entire_start_index, end=entire_end_index)
        # print("new_start",new_start)
        # print("new_end", new_end)

        answer = paragraph[new_start:new_end+1]
        print('final prediction',answer)

    # Remove spaces in answer (e.g. "大 金" --> "大金")
    return answer.replace(' ','')

## Training

In [11]:
num_epoch = 5
validation = True
logging_step = 100
learning_rate = 1e-5
doc_stride = 50
# doc_stride = 150
accum_iter = 4


optimizer = AdamW(model.parameters(), lr=learning_rate)

num_training_steps = len(train_loader) * num_epoch

# scheudler = transformers.get_polynomial_decay_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=num_training_steps, lr_end = 1e-07, power = 1.0, last_epoch = -1)
scheudler = transformers.get_cosine_with_hard_restarts_schedule_with_warmup(optimizer, num_warmup_steps = 0, num_training_steps = num_training_steps)

if fp16_training:
    model, optimizer, train_loader = accelerator.prepare(model, optimizer, train_loader) 

model.train()

print("Start Training ...")

for epoch in range(num_epoch):
    step = 1
    train_loss = train_acc = 0
    
    for idx, data in enumerate(tqdm(train_loader)):	
        # Load all data into GPU
        data = [i.to(device) for i in data]
        
        # Model inputs: input_ids, token_type_ids, attention_mask, start_positions, end_positions (Note: only "input_ids" is mandatory)
        # Model outputs: start_logits, end_logits, loss (return when start_positions/end_positions are provided)  
        output = model(input_ids=data[0], token_type_ids=data[1], attention_mask=data[2], start_positions=data[3], end_positions=data[4])

        # Choose the most probable start position / end position
        start_index = torch.argmax(output.start_logits, dim=1)
        end_index = torch.argmax(output.end_logits, dim=1)
        
        # Prediction is correct only if both start_index and end_index are correct
        train_acc += ((start_index == data[3]) & (end_index == data[4])).float().mean()
        train_loss += output.loss
        output.loss = output.loss / accum_iter
        
        if fp16_training:
            accelerator.backward(output.loss)
        else:
            output.loss.backward()

        
        if ((idx + 1) % accum_iter == 0) or (idx + 1 == len(train_loader)):
            optimizer.step()
            scheudler.step()
            optimizer.zero_grad()
        step += 1

        ##### TODO: Apply linear learning rate decay #####
        
        
        # Print training loss and accuracy over past logging step
        if step % logging_step == 0:
            print(f"Epoch {epoch + 1} | Step {step} | loss = {train_loss.item() / logging_step:.3f}, acc = {train_acc / logging_step:.3f}")
            train_loss = train_acc = 0

    if validation:
        print("Evaluating Dev Set ...")
        model.eval()
        with torch.no_grad():
            dev_acc = 0
            for i, data in enumerate(tqdm(dev_loader)):
                output = model(input_ids=data[0].squeeze(dim=0).to(device), token_type_ids=data[1].squeeze(dim=0).to(device),
                       attention_mask=data[2].squeeze(dim=0).to(device))
                # prediction is correct only if answer text exactly matches
                dev_acc += evaluate(data, output, doc_stride, dev_paragraphs[dev_questions[i]['paragraph_id']], 
                    dev_paragraphs_tokenized[dev_questions[i]['paragraph_id']].tokens) == dev_questions[i]["answer_text"]
            print(f"Validation | Epoch {epoch + 1} | acc = {dev_acc / len(dev_loader):.3f}")
        model.train()

# Save a model and its configuration file to the directory 「saved_model」 
# i.e. there are two files under the direcory 「saved_model」: 「pytorch_model.bin」 and 「config.json」
# Saved model can be re-loaded using 「model = BertForQuestionAnswering.from_pretrained("saved_model")」
print("Saving Model ...")
model_save_dir = "saved_model" 
model.save_pretrained(model_save_dir)

Start Training ...


  1%|          | 99/7923 [00:35<52:15,  2.50it/s]

Epoch 1 | Step 100 | loss = 4.756, acc = 0.030


  3%|▎         | 199/7923 [01:12<52:18,  2.46it/s]

Epoch 1 | Step 200 | loss = 2.755, acc = 0.207


  4%|▍         | 299/7923 [01:48<51:10,  2.48it/s]

Epoch 1 | Step 300 | loss = 1.726, acc = 0.397


  5%|▌         | 399/7923 [02:25<50:54,  2.46it/s]

Epoch 1 | Step 400 | loss = 1.376, acc = 0.470


  6%|▋         | 499/7923 [03:02<50:56,  2.43it/s]

Epoch 1 | Step 500 | loss = 1.053, acc = 0.597


  8%|▊         | 599/7923 [03:39<49:45,  2.45it/s]

Epoch 1 | Step 600 | loss = 0.995, acc = 0.618


  9%|▉         | 699/7923 [04:15<49:24,  2.44it/s]

Epoch 1 | Step 700 | loss = 0.937, acc = 0.657


 10%|█         | 799/7923 [04:52<47:37,  2.49it/s]

Epoch 1 | Step 800 | loss = 0.829, acc = 0.683


 11%|█▏        | 899/7923 [05:28<47:09,  2.48it/s]

Epoch 1 | Step 900 | loss = 0.914, acc = 0.650


 13%|█▎        | 999/7923 [06:05<46:09,  2.50it/s]

Epoch 1 | Step 1000 | loss = 0.759, acc = 0.725


 14%|█▍        | 1099/7923 [06:41<45:06,  2.52it/s]

Epoch 1 | Step 1100 | loss = 0.754, acc = 0.727


 15%|█▌        | 1199/7923 [07:17<45:07,  2.48it/s]

Epoch 1 | Step 1200 | loss = 0.765, acc = 0.705


 16%|█▋        | 1299/7923 [07:53<43:20,  2.55it/s]

Epoch 1 | Step 1300 | loss = 0.724, acc = 0.730


 18%|█▊        | 1399/7923 [08:28<45:00,  2.42it/s]

Epoch 1 | Step 1400 | loss = 0.874, acc = 0.683


 19%|█▉        | 1499/7923 [09:05<42:47,  2.50it/s]

Epoch 1 | Step 1500 | loss = 0.720, acc = 0.725


 20%|██        | 1599/7923 [09:41<42:27,  2.48it/s]

Epoch 1 | Step 1600 | loss = 0.639, acc = 0.743


 21%|██▏       | 1699/7923 [10:18<42:48,  2.42it/s]

Epoch 1 | Step 1700 | loss = 0.694, acc = 0.717


 23%|██▎       | 1799/7923 [10:54<41:22,  2.47it/s]

Epoch 1 | Step 1800 | loss = 0.678, acc = 0.735


 24%|██▍       | 1899/7923 [11:31<38:52,  2.58it/s]

Epoch 1 | Step 1900 | loss = 0.651, acc = 0.727


 25%|██▌       | 1999/7923 [12:08<39:32,  2.50it/s]

Epoch 1 | Step 2000 | loss = 0.755, acc = 0.743


 26%|██▋       | 2099/7923 [12:45<41:17,  2.35it/s]

Epoch 1 | Step 2100 | loss = 0.792, acc = 0.727


 28%|██▊       | 2199/7923 [13:22<38:20,  2.49it/s]

Epoch 1 | Step 2200 | loss = 0.762, acc = 0.750


 29%|██▉       | 2299/7923 [13:58<39:25,  2.38it/s]

Epoch 1 | Step 2300 | loss = 0.697, acc = 0.725


 30%|███       | 2399/7923 [14:35<36:54,  2.49it/s]

Epoch 1 | Step 2400 | loss = 0.710, acc = 0.700


 32%|███▏      | 2499/7923 [15:12<36:59,  2.44it/s]

Epoch 1 | Step 2500 | loss = 0.612, acc = 0.770


 33%|███▎      | 2599/7923 [15:48<35:47,  2.48it/s]

Epoch 1 | Step 2600 | loss = 0.673, acc = 0.735


 34%|███▍      | 2699/7923 [16:25<35:29,  2.45it/s]

Epoch 1 | Step 2700 | loss = 0.745, acc = 0.743


 35%|███▌      | 2799/7923 [17:01<34:11,  2.50it/s]

Epoch 1 | Step 2800 | loss = 0.640, acc = 0.715


 37%|███▋      | 2899/7923 [17:37<33:15,  2.52it/s]

Epoch 1 | Step 2900 | loss = 0.675, acc = 0.707


 38%|███▊      | 2999/7923 [18:13<32:25,  2.53it/s]

Epoch 1 | Step 3000 | loss = 0.614, acc = 0.745


 39%|███▉      | 3099/7923 [18:49<32:02,  2.51it/s]

Epoch 1 | Step 3100 | loss = 0.659, acc = 0.730


 40%|████      | 3199/7923 [19:25<31:56,  2.46it/s]

Epoch 1 | Step 3200 | loss = 0.663, acc = 0.777


 42%|████▏     | 3299/7923 [20:02<31:46,  2.43it/s]

Epoch 1 | Step 3300 | loss = 0.553, acc = 0.765


 43%|████▎     | 3399/7923 [20:38<30:24,  2.48it/s]

Epoch 1 | Step 3400 | loss = 0.679, acc = 0.720


 44%|████▍     | 3499/7923 [21:15<30:33,  2.41it/s]

Epoch 1 | Step 3500 | loss = 0.610, acc = 0.740


 45%|████▌     | 3599/7923 [21:52<28:47,  2.50it/s]

Epoch 1 | Step 3600 | loss = 0.543, acc = 0.772


 47%|████▋     | 3699/7923 [22:29<28:09,  2.50it/s]

Epoch 1 | Step 3700 | loss = 0.659, acc = 0.750


 48%|████▊     | 3799/7923 [23:06<26:53,  2.56it/s]

Epoch 1 | Step 3800 | loss = 0.637, acc = 0.755


 49%|████▉     | 3899/7923 [23:43<27:52,  2.41it/s]

Epoch 1 | Step 3900 | loss = 0.516, acc = 0.782


 50%|█████     | 3999/7923 [24:19<27:13,  2.40it/s]

Epoch 1 | Step 4000 | loss = 0.646, acc = 0.787


 52%|█████▏    | 4099/7923 [24:56<25:45,  2.47it/s]

Epoch 1 | Step 4100 | loss = 0.681, acc = 0.757


 53%|█████▎    | 4199/7923 [25:32<25:31,  2.43it/s]

Epoch 1 | Step 4200 | loss = 0.590, acc = 0.775


 54%|█████▍    | 4299/7923 [26:09<24:36,  2.45it/s]

Epoch 1 | Step 4300 | loss = 0.648, acc = 0.740


 56%|█████▌    | 4399/7923 [26:45<23:13,  2.53it/s]

Epoch 1 | Step 4400 | loss = 0.505, acc = 0.772


 57%|█████▋    | 4499/7923 [27:20<23:20,  2.44it/s]

Epoch 1 | Step 4500 | loss = 0.586, acc = 0.765


 58%|█████▊    | 4599/7923 [27:56<22:12,  2.50it/s]

Epoch 1 | Step 4600 | loss = 0.483, acc = 0.797


 59%|█████▉    | 4699/7923 [28:33<21:37,  2.48it/s]

Epoch 1 | Step 4700 | loss = 0.488, acc = 0.772


 61%|██████    | 4799/7923 [29:10<21:59,  2.37it/s]

Epoch 1 | Step 4800 | loss = 0.545, acc = 0.795


 62%|██████▏   | 4899/7923 [29:47<20:27,  2.46it/s]

Epoch 1 | Step 4900 | loss = 0.634, acc = 0.752


 63%|██████▎   | 4999/7923 [30:24<19:46,  2.46it/s]

Epoch 1 | Step 5000 | loss = 0.511, acc = 0.792


 64%|██████▍   | 5099/7923 [31:01<19:34,  2.40it/s]

Epoch 1 | Step 5100 | loss = 0.424, acc = 0.827


 66%|██████▌   | 5199/7923 [31:38<18:53,  2.40it/s]

Epoch 1 | Step 5200 | loss = 0.542, acc = 0.790


 67%|██████▋   | 5299/7923 [32:15<17:43,  2.47it/s]

Epoch 1 | Step 5300 | loss = 0.480, acc = 0.803


 68%|██████▊   | 5399/7923 [32:52<18:01,  2.33it/s]

Epoch 1 | Step 5400 | loss = 0.521, acc = 0.785


 69%|██████▉   | 5499/7923 [33:29<16:31,  2.44it/s]

Epoch 1 | Step 5500 | loss = 0.506, acc = 0.770


 71%|███████   | 5599/7923 [34:05<15:51,  2.44it/s]

Epoch 1 | Step 5600 | loss = 0.623, acc = 0.722


 72%|███████▏  | 5699/7923 [34:42<14:58,  2.48it/s]

Epoch 1 | Step 5700 | loss = 0.598, acc = 0.767


 73%|███████▎  | 5799/7923 [35:18<14:23,  2.46it/s]

Epoch 1 | Step 5800 | loss = 0.535, acc = 0.757


 74%|███████▍  | 5899/7923 [35:55<13:54,  2.43it/s]

Epoch 1 | Step 5900 | loss = 0.506, acc = 0.795


 76%|███████▌  | 5999/7923 [36:32<13:48,  2.32it/s]

Epoch 1 | Step 6000 | loss = 0.543, acc = 0.752


 77%|███████▋  | 6099/7923 [37:09<12:55,  2.35it/s]

Epoch 1 | Step 6100 | loss = 0.483, acc = 0.820


 78%|███████▊  | 6199/7923 [37:46<11:46,  2.44it/s]

Epoch 1 | Step 6200 | loss = 0.550, acc = 0.815


 80%|███████▉  | 6299/7923 [38:23<10:46,  2.51it/s]

Epoch 1 | Step 6300 | loss = 0.607, acc = 0.755


 81%|████████  | 6399/7923 [39:00<10:11,  2.49it/s]

Epoch 1 | Step 6400 | loss = 0.489, acc = 0.790


 82%|████████▏ | 6499/7923 [39:37<09:53,  2.40it/s]

Epoch 1 | Step 6500 | loss = 0.659, acc = 0.777


 83%|████████▎ | 6599/7923 [40:14<08:55,  2.47it/s]

Epoch 1 | Step 6600 | loss = 0.494, acc = 0.795


 85%|████████▍ | 6699/7923 [40:51<08:37,  2.36it/s]

Epoch 1 | Step 6700 | loss = 0.467, acc = 0.800


 86%|████████▌ | 6799/7923 [41:28<07:26,  2.52it/s]

Epoch 1 | Step 6800 | loss = 0.484, acc = 0.797


 87%|████████▋ | 6899/7923 [42:05<07:05,  2.41it/s]

Epoch 1 | Step 6900 | loss = 0.507, acc = 0.782


 88%|████████▊ | 6999/7923 [42:41<06:14,  2.47it/s]

Epoch 1 | Step 7000 | loss = 0.453, acc = 0.810


 90%|████████▉ | 7099/7923 [43:18<05:40,  2.42it/s]

Epoch 1 | Step 7100 | loss = 0.488, acc = 0.797


 91%|█████████ | 7199/7923 [43:54<05:01,  2.40it/s]

Epoch 1 | Step 7200 | loss = 0.501, acc = 0.803


 92%|█████████▏| 7299/7923 [44:31<04:07,  2.52it/s]

Epoch 1 | Step 7300 | loss = 0.560, acc = 0.790


 93%|█████████▎| 7399/7923 [45:08<03:40,  2.38it/s]

Epoch 1 | Step 7400 | loss = 0.466, acc = 0.800


 95%|█████████▍| 7499/7923 [45:45<02:54,  2.42it/s]

Epoch 1 | Step 7500 | loss = 0.590, acc = 0.762


 96%|█████████▌| 7599/7923 [46:22<02:17,  2.36it/s]

Epoch 1 | Step 7600 | loss = 0.516, acc = 0.760


 97%|█████████▋| 7699/7923 [46:59<01:33,  2.40it/s]

Epoch 1 | Step 7700 | loss = 0.502, acc = 0.792


 98%|█████████▊| 7799/7923 [47:36<00:50,  2.45it/s]

Epoch 1 | Step 7800 | loss = 0.581, acc = 0.795


100%|█████████▉| 7899/7923 [48:13<00:09,  2.42it/s]

Epoch 1 | Step 7900 | loss = 0.369, acc = 0.835


100%|██████████| 7923/7923 [48:22<00:00,  2.73it/s]


Evaluating Dev Set ...


  6%|▌         | 228/4131 [00:54<15:57,  4.07it/s]

found [UNK] in prediction.
original pred: 李 [UNK]
final prediction 李杲


  9%|▉         | 390/4131 [01:33<15:23,  4.05it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK]
final prediction 朱允炆


 10%|▉         | 393/4131 [01:34<13:14,  4.70it/s]

found [UNK] in prediction.
original pred: [UNK] 崎 八 幡 宮
final prediction 筥崎八幡宮


 12%|█▏        | 504/4131 [02:00<12:58,  4.66it/s]

found [UNK] in prediction.
original pred: 與 慕 容 [UNK] 雙 方 不 和
final prediction 與慕容廆雙方不和


 17%|█▋        | 700/4131 [02:47<13:25,  4.26it/s]

found [UNK] in prediction.
original pred: 對 日 本 報 紙 的 無 恥 造 謠 誣 [UNK] ， 進 行 了 有 力 駁 斥
final prediction 對日本報紙的無恥造謠誣衊，進行了有力駁斥


 20%|██        | 830/4131 [03:17<11:35,  4.75it/s]

found [UNK] in prediction.
original pred: 木 骨 [UNK]
final prediction 木骨閭


 20%|██        | 842/4131 [03:20<11:36,  4.72it/s]

found [UNK] in prediction.
original pred: 杜 恆 - [UNK] 因 論 題
final prediction 杜恆-蒯因論題


 24%|██▍       | 984/4131 [03:54<12:12,  4.30it/s]

found [UNK] in prediction.
original pred: 青 翁 三 足 [UNK]
final prediction 青翁三足缶


 34%|███▎      | 1384/4131 [05:29<09:57,  4.60it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 34%|███▍      | 1401/4131 [05:33<11:50,  3.84it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK]
final prediction 朱允炆


 37%|███▋      | 1512/4131 [05:59<10:27,  4.17it/s]

found [UNK] in prediction.
original pred: 為 免 再 次 爆 發 內 [UNK]
final prediction 為免再次爆發內訌


 45%|████▍     | 1846/4131 [07:20<07:37,  5.00it/s]

found [UNK] in prediction.
original pred: [UNK] 船
final prediction 艚船


 53%|█████▎    | 2205/4131 [08:44<07:41,  4.18it/s]

found [UNK] in prediction.
original pred: 朱 載 [UNK]
final prediction 朱載堉


 55%|█████▍    | 2262/4131 [08:58<07:28,  4.17it/s]

found [UNK] in prediction.
original pred: 學 習 訓 [UNK] 學
final prediction 那裏學習訓


 58%|█████▊    | 2400/4131 [09:31<05:51,  4.93it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK] 的 禁 殺 之 旨
final prediction 朱允炆的禁殺之旨


 64%|██████▍   | 2662/4131 [10:33<05:53,  4.15it/s]

found [UNK] in prediction.
original pred: 李 端 [UNK]
final prediction 李端棻


 67%|██████▋   | 2752/4131 [10:54<05:13,  4.40it/s]

found [UNK] in prediction.
original pred: w. v. [UNK] 因
final prediction W.V.蒯因


 71%|███████   | 2937/4131 [11:38<05:55,  3.36it/s]

found [UNK] in prediction.
original pred: 拓 跋 [UNK]
final prediction 拓跋燾


 77%|███████▋  | 3167/4131 [12:30<02:36,  6.15it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 77%|███████▋  | 3168/4131 [12:30<02:52,  5.59it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 96%|█████████▋| 3982/4131 [15:22<00:32,  4.53it/s]

found [UNK] in prediction.
original pred: 學 習 訓 [UNK] 學
final prediction 學習訓詁學


100%|██████████| 4131/4131 [15:57<00:00,  4.32it/s]


Validation | Epoch 1 | acc = 0.787


  1%|          | 99/7923 [00:35<54:07,  2.41it/s]

Epoch 2 | Step 100 | loss = 0.328, acc = 0.862


  3%|▎         | 199/7923 [01:12<50:59,  2.52it/s]

Epoch 2 | Step 200 | loss = 0.302, acc = 0.860


  4%|▍         | 299/7923 [01:48<51:11,  2.48it/s]

Epoch 2 | Step 300 | loss = 0.365, acc = 0.812


  5%|▌         | 399/7923 [02:25<50:39,  2.48it/s]

Epoch 2 | Step 400 | loss = 0.343, acc = 0.868


  6%|▋         | 499/7923 [03:02<50:50,  2.43it/s]

Epoch 2 | Step 500 | loss = 0.450, acc = 0.842


  8%|▊         | 599/7923 [03:39<50:09,  2.43it/s]

Epoch 2 | Step 600 | loss = 0.326, acc = 0.845


  9%|▉         | 699/7923 [04:16<49:45,  2.42it/s]

Epoch 2 | Step 700 | loss = 0.267, acc = 0.868


 10%|█         | 799/7923 [04:53<48:56,  2.43it/s]

Epoch 2 | Step 800 | loss = 0.296, acc = 0.850


 11%|█▏        | 899/7923 [05:30<47:20,  2.47it/s]

Epoch 2 | Step 900 | loss = 0.295, acc = 0.873


 13%|█▎        | 999/7923 [06:06<46:05,  2.50it/s]

Epoch 2 | Step 1000 | loss = 0.396, acc = 0.850


 14%|█▍        | 1099/7923 [06:43<46:52,  2.43it/s]

Epoch 2 | Step 1100 | loss = 0.291, acc = 0.862


 15%|█▌        | 1199/7923 [07:19<44:47,  2.50it/s]

Epoch 2 | Step 1200 | loss = 0.325, acc = 0.847


 16%|█▋        | 1299/7923 [07:56<44:46,  2.47it/s]

Epoch 2 | Step 1300 | loss = 0.375, acc = 0.840


 18%|█▊        | 1399/7923 [08:32<44:09,  2.46it/s]

Epoch 2 | Step 1400 | loss = 0.310, acc = 0.875


 19%|█▉        | 1499/7923 [09:08<42:51,  2.50it/s]

Epoch 2 | Step 1500 | loss = 0.342, acc = 0.857


 20%|██        | 1599/7923 [09:44<42:31,  2.48it/s]

Epoch 2 | Step 1600 | loss = 0.292, acc = 0.885


 21%|██▏       | 1699/7923 [10:20<42:32,  2.44it/s]

Epoch 2 | Step 1700 | loss = 0.267, acc = 0.875


 23%|██▎       | 1799/7923 [10:57<40:44,  2.50it/s]

Epoch 2 | Step 1800 | loss = 0.250, acc = 0.857


 24%|██▍       | 1899/7923 [11:33<41:38,  2.41it/s]

Epoch 2 | Step 1900 | loss = 0.340, acc = 0.855


 25%|██▌       | 1999/7923 [12:10<39:38,  2.49it/s]

Epoch 2 | Step 2000 | loss = 0.280, acc = 0.857


 26%|██▋       | 2099/7923 [12:47<39:31,  2.46it/s]

Epoch 2 | Step 2100 | loss = 0.313, acc = 0.865


 28%|██▊       | 2199/7923 [13:24<39:10,  2.44it/s]

Epoch 2 | Step 2200 | loss = 0.318, acc = 0.840


 29%|██▉       | 2299/7923 [14:01<38:54,  2.41it/s]

Epoch 2 | Step 2300 | loss = 0.344, acc = 0.827


 30%|███       | 2399/7923 [14:38<37:42,  2.44it/s]

Epoch 2 | Step 2400 | loss = 0.296, acc = 0.855


 32%|███▏      | 2499/7923 [15:15<36:31,  2.48it/s]

Epoch 2 | Step 2500 | loss = 0.354, acc = 0.860


 33%|███▎      | 2599/7923 [15:51<35:55,  2.47it/s]

Epoch 2 | Step 2600 | loss = 0.330, acc = 0.855


 34%|███▍      | 2699/7923 [16:28<34:35,  2.52it/s]

Epoch 2 | Step 2700 | loss = 0.374, acc = 0.832


 35%|███▌      | 2799/7923 [17:04<33:45,  2.53it/s]

Epoch 2 | Step 2800 | loss = 0.373, acc = 0.835


 37%|███▋      | 2899/7923 [17:41<34:02,  2.46it/s]

Epoch 2 | Step 2900 | loss = 0.403, acc = 0.832


 38%|███▊      | 2999/7923 [18:17<33:30,  2.45it/s]

Epoch 2 | Step 3000 | loss = 0.312, acc = 0.875


 39%|███▉      | 3099/7923 [18:53<31:16,  2.57it/s]

Epoch 2 | Step 3100 | loss = 0.367, acc = 0.837


 40%|████      | 3199/7923 [19:29<30:50,  2.55it/s]

Epoch 2 | Step 3200 | loss = 0.333, acc = 0.855


 42%|████▏     | 3299/7923 [20:05<30:43,  2.51it/s]

Epoch 2 | Step 3300 | loss = 0.365, acc = 0.850


 43%|████▎     | 3399/7923 [20:41<30:36,  2.46it/s]

Epoch 2 | Step 3400 | loss = 0.299, acc = 0.882


 44%|████▍     | 3499/7923 [21:17<30:08,  2.45it/s]

Epoch 2 | Step 3500 | loss = 0.254, acc = 0.873


 45%|████▌     | 3599/7923 [21:54<29:39,  2.43it/s]

Epoch 2 | Step 3600 | loss = 0.333, acc = 0.857


 47%|████▋     | 3699/7923 [22:31<28:09,  2.50it/s]

Epoch 2 | Step 3700 | loss = 0.376, acc = 0.847


 48%|████▊     | 3799/7923 [23:08<28:31,  2.41it/s]

Epoch 2 | Step 3800 | loss = 0.344, acc = 0.857


 49%|████▉     | 3899/7923 [23:44<28:10,  2.38it/s]

Epoch 2 | Step 3900 | loss = 0.355, acc = 0.850


 50%|█████     | 3999/7923 [24:21<27:10,  2.41it/s]

Epoch 2 | Step 4000 | loss = 0.435, acc = 0.817


 52%|█████▏    | 4099/7923 [24:58<25:45,  2.47it/s]

Epoch 2 | Step 4100 | loss = 0.313, acc = 0.895


 53%|█████▎    | 4199/7923 [25:35<24:58,  2.49it/s]

Epoch 2 | Step 4200 | loss = 0.260, acc = 0.887


 54%|█████▍    | 4299/7923 [26:12<24:48,  2.44it/s]

Epoch 2 | Step 4300 | loss = 0.362, acc = 0.847


 56%|█████▌    | 4399/7923 [26:48<23:24,  2.51it/s]

Epoch 2 | Step 4400 | loss = 0.291, acc = 0.868


 57%|█████▋    | 4499/7923 [27:25<23:33,  2.42it/s]

Epoch 2 | Step 4500 | loss = 0.278, acc = 0.852


 58%|█████▊    | 4599/7923 [28:01<21:58,  2.52it/s]

Epoch 2 | Step 4600 | loss = 0.332, acc = 0.862


 59%|█████▉    | 4699/7923 [28:37<21:17,  2.52it/s]

Epoch 2 | Step 4700 | loss = 0.330, acc = 0.830


 61%|██████    | 4799/7923 [29:13<20:57,  2.48it/s]

Epoch 2 | Step 4800 | loss = 0.280, acc = 0.862


 62%|██████▏   | 4899/7923 [29:50<20:29,  2.46it/s]

Epoch 2 | Step 4900 | loss = 0.296, acc = 0.852


 63%|██████▎   | 4999/7923 [30:27<20:31,  2.37it/s]

Epoch 2 | Step 5000 | loss = 0.355, acc = 0.857


 64%|██████▍   | 5099/7923 [31:04<18:58,  2.48it/s]

Epoch 2 | Step 5100 | loss = 0.412, acc = 0.847


 66%|██████▌   | 5199/7923 [31:40<18:31,  2.45it/s]

Epoch 2 | Step 5200 | loss = 0.320, acc = 0.835


 67%|██████▋   | 5299/7923 [32:17<18:14,  2.40it/s]

Epoch 2 | Step 5300 | loss = 0.274, acc = 0.855


 68%|██████▊   | 5399/7923 [32:54<17:04,  2.46it/s]

Epoch 2 | Step 5400 | loss = 0.333, acc = 0.855


 69%|██████▉   | 5499/7923 [33:30<16:18,  2.48it/s]

Epoch 2 | Step 5500 | loss = 0.337, acc = 0.860


 71%|███████   | 5599/7923 [34:06<15:28,  2.50it/s]

Epoch 2 | Step 5600 | loss = 0.313, acc = 0.855


 72%|███████▏  | 5699/7923 [34:42<14:45,  2.51it/s]

Epoch 2 | Step 5700 | loss = 0.277, acc = 0.885


 73%|███████▎  | 5799/7923 [35:19<15:04,  2.35it/s]

Epoch 2 | Step 5800 | loss = 0.320, acc = 0.842


 74%|███████▍  | 5899/7923 [35:56<13:53,  2.43it/s]

Epoch 2 | Step 5900 | loss = 0.346, acc = 0.847


 76%|███████▌  | 5999/7923 [36:33<13:08,  2.44it/s]

Epoch 2 | Step 6000 | loss = 0.417, acc = 0.817


 77%|███████▋  | 6099/7923 [37:10<12:26,  2.44it/s]

Epoch 2 | Step 6100 | loss = 0.267, acc = 0.868


 78%|███████▊  | 6199/7923 [37:47<11:37,  2.47it/s]

Epoch 2 | Step 6200 | loss = 0.319, acc = 0.862


 80%|███████▉  | 6299/7923 [38:23<11:07,  2.43it/s]

Epoch 2 | Step 6300 | loss = 0.378, acc = 0.817


 81%|████████  | 6399/7923 [39:00<10:32,  2.41it/s]

Epoch 2 | Step 6400 | loss = 0.310, acc = 0.860


 82%|████████▏ | 6499/7923 [39:36<09:38,  2.46it/s]

Epoch 2 | Step 6500 | loss = 0.246, acc = 0.895


 83%|████████▎ | 6599/7923 [40:12<08:45,  2.52it/s]

Epoch 2 | Step 6600 | loss = 0.346, acc = 0.837


 85%|████████▍ | 6699/7923 [40:48<08:08,  2.51it/s]

Epoch 2 | Step 6700 | loss = 0.367, acc = 0.855


 86%|████████▌ | 6799/7923 [41:24<07:43,  2.42it/s]

Epoch 2 | Step 6800 | loss = 0.340, acc = 0.837


 87%|████████▋ | 6899/7923 [42:01<07:06,  2.40it/s]

Epoch 2 | Step 6900 | loss = 0.236, acc = 0.887


 88%|████████▊ | 6999/7923 [42:38<06:13,  2.47it/s]

Epoch 2 | Step 7000 | loss = 0.273, acc = 0.857


 90%|████████▉ | 7099/7923 [43:14<05:38,  2.43it/s]

Epoch 2 | Step 7100 | loss = 0.340, acc = 0.857


 91%|█████████ | 7199/7923 [43:51<04:56,  2.45it/s]

Epoch 2 | Step 7200 | loss = 0.239, acc = 0.873


 92%|█████████▏| 7299/7923 [44:28<04:13,  2.46it/s]

Epoch 2 | Step 7300 | loss = 0.361, acc = 0.860


 93%|█████████▎| 7399/7923 [45:05<03:38,  2.40it/s]

Epoch 2 | Step 7400 | loss = 0.309, acc = 0.877


 95%|█████████▍| 7499/7923 [45:41<02:51,  2.47it/s]

Epoch 2 | Step 7500 | loss = 0.288, acc = 0.847


 96%|█████████▌| 7599/7923 [46:18<02:13,  2.43it/s]

Epoch 2 | Step 7600 | loss = 0.309, acc = 0.847


 97%|█████████▋| 7699/7923 [46:54<01:28,  2.53it/s]

Epoch 2 | Step 7700 | loss = 0.269, acc = 0.882


 98%|█████████▊| 7799/7923 [47:30<00:49,  2.49it/s]

Epoch 2 | Step 7800 | loss = 0.316, acc = 0.860


100%|█████████▉| 7899/7923 [48:06<00:09,  2.48it/s]

Epoch 2 | Step 7900 | loss = 0.320, acc = 0.870


100%|██████████| 7923/7923 [48:15<00:00,  2.74it/s]


Evaluating Dev Set ...


  6%|▌         | 228/4131 [01:00<17:56,  3.63it/s]

found [UNK] in prediction.
original pred: 李 [UNK]
final prediction 李杲


  9%|▉         | 390/4131 [01:45<17:35,  3.54it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK]
final prediction 朱允炆


 10%|▉         | 393/4131 [01:45<14:56,  4.17it/s]

found [UNK] in prediction.
original pred: [UNK] 崎 八 幡 宮
final prediction 筥崎八幡宮


 12%|█▏        | 504/4131 [02:16<15:07,  4.00it/s]

found [UNK] in prediction.
original pred: 與 慕 容 [UNK] 雙 方 不 和
final prediction 與慕容廆雙方不和


 17%|█▋        | 700/4131 [03:09<14:53,  3.84it/s]

found [UNK] in prediction.
original pred: 對 日 本 報 紙 的 無 恥 造 謠 誣 [UNK] ， 進 行 了 有 力 駁 斥
final prediction 對日本報紙的無恥造謠誣衊，進行了有力駁斥


 20%|██        | 829/4131 [03:44<14:48,  3.72it/s]

found [UNK] in prediction.
original pred: 木 骨 [UNK]
final prediction 木骨閭


 20%|██        | 842/4131 [03:47<13:03,  4.20it/s]

found [UNK] in prediction.
original pred: 杜 恆 - [UNK] 因 論 題
final prediction 杜恆-蒯因論題


 24%|██▍       | 984/4131 [04:27<14:15,  3.68it/s]

found [UNK] in prediction.
original pred: 青 翁 三 足 [UNK]
final prediction 青翁三足缶


 34%|███▎      | 1384/4131 [06:14<11:17,  4.05it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 34%|███▍      | 1401/4131 [06:19<13:05,  3.48it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK]
final prediction 朱允炆


 35%|███▍      | 1443/4131 [06:31<13:00,  3.45it/s]

found [UNK] in prediction.
original pred: 劉 [UNK]
final prediction 劉炟


 37%|███▋      | 1512/4131 [06:49<11:19,  3.85it/s]

found [UNK] in prediction.
original pred: 免 再 次 爆 發 內 [UNK]
final prediction 免再次爆發內訌


 45%|████▍     | 1846/4131 [08:18<08:23,  4.54it/s]

found [UNK] in prediction.
original pred: [UNK] 船
final prediction 艚船


 53%|█████▎    | 2205/4131 [09:53<08:29,  3.78it/s]

found [UNK] in prediction.
original pred: 朱 載 [UNK]
final prediction 朱載堉


 55%|█████▍    | 2262/4131 [10:09<08:44,  3.57it/s]

found [UNK] in prediction.
original pred: 陳 梅 坪 那 裏 學 習 訓 [UNK] 學
final prediction 佛山陳梅坪那裏學習訓


 58%|█████▊    | 2400/4131 [10:46<06:51,  4.20it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK] 的 禁 殺 之 旨
final prediction 朱允炆的禁殺之旨


 64%|██████▍   | 2662/4131 [11:57<06:47,  3.60it/s]

found [UNK] in prediction.
original pred: 李 端 [UNK]
final prediction 李端棻


 67%|██████▋   | 2752/4131 [12:20<05:52,  3.91it/s]

found [UNK] in prediction.
original pred: w. v. [UNK] 因
final prediction W.V.蒯因


 71%|███████   | 2937/4131 [13:11<06:47,  2.93it/s]

found [UNK] in prediction.
original pred: 拓 跋 [UNK]
final prediction 拓跋燾


 77%|███████▋  | 3168/4131 [14:10<02:57,  5.43it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 90%|████████▉ | 3717/4131 [16:14<01:14,  5.54it/s]

found [UNK] in prediction.
original pred: 拓 跋 [UNK]
final prediction 拓跋燾


100%|██████████| 4131/4131 [17:46<00:00,  3.87it/s]


Validation | Epoch 2 | acc = 0.777


  1%|          | 99/7923 [00:36<55:35,  2.35it/s]

Epoch 3 | Step 100 | loss = 0.213, acc = 0.890


  3%|▎         | 199/7923 [01:13<51:40,  2.49it/s]

Epoch 3 | Step 200 | loss = 0.164, acc = 0.907


  4%|▍         | 299/7923 [01:50<50:40,  2.51it/s]

Epoch 3 | Step 300 | loss = 0.179, acc = 0.922


  5%|▌         | 399/7923 [02:26<50:56,  2.46it/s]

Epoch 3 | Step 400 | loss = 0.168, acc = 0.922


  6%|▋         | 499/7923 [03:04<52:31,  2.36it/s]

Epoch 3 | Step 500 | loss = 0.185, acc = 0.910


  8%|▊         | 599/7923 [03:40<48:56,  2.49it/s]

Epoch 3 | Step 600 | loss = 0.171, acc = 0.933


  9%|▉         | 699/7923 [04:17<50:51,  2.37it/s]

Epoch 3 | Step 700 | loss = 0.127, acc = 0.938


 10%|█         | 799/7923 [04:54<47:02,  2.52it/s]

Epoch 3 | Step 800 | loss = 0.165, acc = 0.922


 11%|█▏        | 899/7923 [05:30<47:56,  2.44it/s]

Epoch 3 | Step 900 | loss = 0.196, acc = 0.920


 13%|█▎        | 999/7923 [06:07<46:09,  2.50it/s]

Epoch 3 | Step 1000 | loss = 0.167, acc = 0.915


 14%|█▍        | 1099/7923 [06:43<46:15,  2.46it/s]

Epoch 3 | Step 1100 | loss = 0.192, acc = 0.915


 15%|█▌        | 1199/7923 [07:19<44:40,  2.51it/s]

Epoch 3 | Step 1200 | loss = 0.141, acc = 0.925


 16%|█▋        | 1299/7923 [07:55<43:06,  2.56it/s]

Epoch 3 | Step 1300 | loss = 0.193, acc = 0.915


 18%|█▊        | 1399/7923 [08:31<43:26,  2.50it/s]

Epoch 3 | Step 1400 | loss = 0.175, acc = 0.933


 19%|█▉        | 1499/7923 [09:07<44:54,  2.38it/s]

Epoch 3 | Step 1500 | loss = 0.201, acc = 0.915


 20%|██        | 1599/7923 [09:44<42:46,  2.46it/s]

Epoch 3 | Step 1600 | loss = 0.208, acc = 0.907


 21%|██▏       | 1699/7923 [10:21<42:19,  2.45it/s]

Epoch 3 | Step 1700 | loss = 0.138, acc = 0.925


 23%|██▎       | 1799/7923 [10:58<42:45,  2.39it/s]

Epoch 3 | Step 1800 | loss = 0.157, acc = 0.930


 24%|██▍       | 1899/7923 [11:35<40:56,  2.45it/s]

Epoch 3 | Step 1900 | loss = 0.132, acc = 0.935


 25%|██▌       | 1999/7923 [12:12<39:26,  2.50it/s]

Epoch 3 | Step 2000 | loss = 0.142, acc = 0.905


 26%|██▋       | 2099/7923 [12:49<40:14,  2.41it/s]

Epoch 3 | Step 2100 | loss = 0.146, acc = 0.927


 28%|██▊       | 2199/7923 [13:26<38:32,  2.48it/s]

Epoch 3 | Step 2200 | loss = 0.165, acc = 0.920


 29%|██▉       | 2299/7923 [14:02<38:08,  2.46it/s]

Epoch 3 | Step 2300 | loss = 0.187, acc = 0.917


 30%|███       | 2399/7923 [14:39<36:52,  2.50it/s]

Epoch 3 | Step 2400 | loss = 0.154, acc = 0.930


 32%|███▏      | 2499/7923 [15:15<35:49,  2.52it/s]

Epoch 3 | Step 2500 | loss = 0.243, acc = 0.902


 33%|███▎      | 2599/7923 [15:51<35:40,  2.49it/s]

Epoch 3 | Step 2600 | loss = 0.172, acc = 0.892


 34%|███▍      | 2699/7923 [16:27<35:04,  2.48it/s]

Epoch 3 | Step 2700 | loss = 0.146, acc = 0.942


 35%|███▌      | 2799/7923 [17:03<34:05,  2.51it/s]

Epoch 3 | Step 2800 | loss = 0.175, acc = 0.930


 37%|███▋      | 2899/7923 [17:39<34:40,  2.41it/s]

Epoch 3 | Step 2900 | loss = 0.207, acc = 0.925


 38%|███▊      | 2999/7923 [18:16<33:29,  2.45it/s]

Epoch 3 | Step 3000 | loss = 0.227, acc = 0.912


 39%|███▉      | 3099/7923 [18:53<31:56,  2.52it/s]

Epoch 3 | Step 3100 | loss = 0.193, acc = 0.925


 40%|████      | 3199/7923 [19:30<32:18,  2.44it/s]

Epoch 3 | Step 3200 | loss = 0.176, acc = 0.920


 42%|████▏     | 3299/7923 [20:07<32:08,  2.40it/s]

Epoch 3 | Step 3300 | loss = 0.145, acc = 0.915


 43%|████▎     | 3399/7923 [20:43<30:27,  2.48it/s]

Epoch 3 | Step 3400 | loss = 0.110, acc = 0.935


 44%|████▍     | 3499/7923 [21:21<29:58,  2.46it/s]

Epoch 3 | Step 3500 | loss = 0.145, acc = 0.915


 45%|████▌     | 3599/7923 [21:57<29:06,  2.48it/s]

Epoch 3 | Step 3600 | loss = 0.203, acc = 0.897


 47%|████▋     | 3699/7923 [22:34<29:09,  2.41it/s]

Epoch 3 | Step 3700 | loss = 0.194, acc = 0.917


 48%|████▊     | 3799/7923 [23:10<27:15,  2.52it/s]

Epoch 3 | Step 3800 | loss = 0.146, acc = 0.900


 49%|████▉     | 3899/7923 [23:46<26:26,  2.54it/s]

Epoch 3 | Step 3900 | loss = 0.181, acc = 0.927


 50%|█████     | 3999/7923 [24:22<26:23,  2.48it/s]

Epoch 3 | Step 4000 | loss = 0.167, acc = 0.907


 52%|█████▏    | 4099/7923 [24:58<24:29,  2.60it/s]

Epoch 3 | Step 4100 | loss = 0.241, acc = 0.902


 53%|█████▎    | 4199/7923 [25:34<25:19,  2.45it/s]

Epoch 3 | Step 4200 | loss = 0.164, acc = 0.930


 54%|█████▍    | 4299/7923 [26:10<24:29,  2.47it/s]

Epoch 3 | Step 4300 | loss = 0.113, acc = 0.940


 56%|█████▌    | 4399/7923 [26:47<23:39,  2.48it/s]

Epoch 3 | Step 4400 | loss = 0.247, acc = 0.920


 57%|█████▋    | 4499/7923 [27:24<23:43,  2.40it/s]

Epoch 3 | Step 4500 | loss = 0.168, acc = 0.915


 58%|█████▊    | 4599/7923 [28:01<22:49,  2.43it/s]

Epoch 3 | Step 4600 | loss = 0.173, acc = 0.912


 59%|█████▉    | 4699/7923 [28:37<22:08,  2.43it/s]

Epoch 3 | Step 4700 | loss = 0.199, acc = 0.892


 61%|██████    | 4799/7923 [29:14<21:29,  2.42it/s]

Epoch 3 | Step 4800 | loss = 0.204, acc = 0.907


 62%|██████▏   | 4899/7923 [29:51<20:16,  2.48it/s]

Epoch 3 | Step 4900 | loss = 0.161, acc = 0.922


 63%|██████▎   | 4999/7923 [30:28<20:23,  2.39it/s]

Epoch 3 | Step 5000 | loss = 0.190, acc = 0.920


 64%|██████▍   | 5099/7923 [31:05<19:22,  2.43it/s]

Epoch 3 | Step 5100 | loss = 0.196, acc = 0.897


 66%|██████▌   | 5199/7923 [31:41<18:54,  2.40it/s]

Epoch 3 | Step 5200 | loss = 0.145, acc = 0.912


 67%|██████▋   | 5299/7923 [32:18<17:22,  2.52it/s]

Epoch 3 | Step 5300 | loss = 0.126, acc = 0.940


 68%|██████▊   | 5399/7923 [32:55<17:08,  2.46it/s]

Epoch 3 | Step 5400 | loss = 0.139, acc = 0.938


 69%|██████▉   | 5499/7923 [33:31<15:53,  2.54it/s]

Epoch 3 | Step 5500 | loss = 0.181, acc = 0.915


 71%|███████   | 5599/7923 [34:07<16:01,  2.42it/s]

Epoch 3 | Step 5600 | loss = 0.218, acc = 0.885


 72%|███████▏  | 5699/7923 [34:44<14:51,  2.50it/s]

Epoch 3 | Step 5700 | loss = 0.229, acc = 0.892


 73%|███████▎  | 5799/7923 [35:20<14:07,  2.51it/s]

Epoch 3 | Step 5800 | loss = 0.224, acc = 0.892


 74%|███████▍  | 5899/7923 [35:56<13:36,  2.48it/s]

Epoch 3 | Step 5900 | loss = 0.193, acc = 0.900


 76%|███████▌  | 5999/7923 [36:32<12:55,  2.48it/s]

Epoch 3 | Step 6000 | loss = 0.207, acc = 0.900


 77%|███████▋  | 6099/7923 [37:07<12:01,  2.53it/s]

Epoch 3 | Step 6100 | loss = 0.227, acc = 0.897


 78%|███████▊  | 6199/7923 [37:43<11:21,  2.53it/s]

Epoch 3 | Step 6200 | loss = 0.188, acc = 0.917


 80%|███████▉  | 6299/7923 [38:19<10:37,  2.55it/s]

Epoch 3 | Step 6300 | loss = 0.140, acc = 0.920


 81%|████████  | 6399/7923 [38:56<10:25,  2.44it/s]

Epoch 3 | Step 6400 | loss = 0.167, acc = 0.920


 82%|████████▏ | 6499/7923 [39:32<09:40,  2.45it/s]

Epoch 3 | Step 6500 | loss = 0.263, acc = 0.895


 83%|████████▎ | 6599/7923 [40:09<08:51,  2.49it/s]

Epoch 3 | Step 6600 | loss = 0.218, acc = 0.905


 85%|████████▍ | 6699/7923 [40:46<08:12,  2.48it/s]

Epoch 3 | Step 6700 | loss = 0.212, acc = 0.935


 86%|████████▌ | 6799/7923 [41:23<07:49,  2.40it/s]

Epoch 3 | Step 6800 | loss = 0.145, acc = 0.933


 87%|████████▋ | 6899/7923 [41:59<06:42,  2.55it/s]

Epoch 3 | Step 6900 | loss = 0.195, acc = 0.912


 88%|████████▊ | 6999/7923 [42:36<06:15,  2.46it/s]

Epoch 3 | Step 7000 | loss = 0.150, acc = 0.933


 90%|████████▉ | 7099/7923 [43:13<05:38,  2.43it/s]

Epoch 3 | Step 7100 | loss = 0.202, acc = 0.900


 91%|█████████ | 7199/7923 [43:50<05:01,  2.40it/s]

Epoch 3 | Step 7200 | loss = 0.199, acc = 0.897


 92%|█████████▏| 7299/7923 [44:27<04:18,  2.41it/s]

Epoch 3 | Step 7300 | loss = 0.199, acc = 0.938


 93%|█████████▎| 7399/7923 [45:03<03:31,  2.48it/s]

Epoch 3 | Step 7400 | loss = 0.208, acc = 0.900


 95%|█████████▍| 7499/7923 [45:40<02:48,  2.51it/s]

Epoch 3 | Step 7500 | loss = 0.191, acc = 0.930


 96%|█████████▌| 7599/7923 [46:16<02:10,  2.48it/s]

Epoch 3 | Step 7600 | loss = 0.168, acc = 0.922


 97%|█████████▋| 7699/7923 [46:52<01:27,  2.56it/s]

Epoch 3 | Step 7700 | loss = 0.208, acc = 0.933


 98%|█████████▊| 7799/7923 [47:28<00:50,  2.47it/s]

Epoch 3 | Step 7800 | loss = 0.191, acc = 0.917


100%|█████████▉| 7899/7923 [48:04<00:09,  2.49it/s]

Epoch 3 | Step 7900 | loss = 0.137, acc = 0.920


100%|██████████| 7923/7923 [48:13<00:00,  2.74it/s]


Evaluating Dev Set ...


  6%|▌         | 228/4131 [01:00<17:31,  3.71it/s]

found [UNK] in prediction.
original pred: 李 [UNK]
final prediction 李杲


  9%|▉         | 390/4131 [01:43<16:26,  3.79it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK]
final prediction 朱允炆


 10%|▉         | 393/4131 [01:44<15:07,  4.12it/s]

found [UNK] in prediction.
original pred: [UNK] 崎 八 幡 宮
final prediction 筥崎八幡宮


 12%|█▏        | 504/4131 [02:14<15:07,  4.00it/s]

found [UNK] in prediction.
original pred: 與 慕 容 [UNK] 雙 方 不 和
final prediction 與慕容廆雙方不和


 17%|█▋        | 700/4131 [03:07<15:05,  3.79it/s]

found [UNK] in prediction.
original pred: 對 日 本 報 紙 的 無 恥 造 謠 誣 [UNK] ， 進 行 了 有 力 駁 斥
final prediction 對日本報紙的無恥造謠誣衊，進行了有力駁斥


 20%|██        | 829/4131 [03:41<14:48,  3.72it/s]

found [UNK] in prediction.
original pred: 木 骨 [UNK]
final prediction 木骨閭


 20%|██        | 842/4131 [03:44<13:30,  4.06it/s]

found [UNK] in prediction.
original pred: 杜 恆 - [UNK] 因 論 題
final prediction 杜恆-蒯因論題


 24%|██▍       | 984/4131 [04:24<13:59,  3.75it/s]

found [UNK] in prediction.
original pred: 青 翁 三 足 [UNK]
final prediction 青翁三足缶


 34%|███▎      | 1384/4131 [06:12<11:09,  4.11it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 34%|███▍      | 1401/4131 [06:17<13:27,  3.38it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK]
final prediction 朱允炆


 35%|███▍      | 1443/4131 [06:29<13:10,  3.40it/s]

found [UNK] in prediction.
original pred: 劉 [UNK]
final prediction 劉炟


 37%|███▋      | 1512/4131 [06:47<12:18,  3.55it/s]

found [UNK] in prediction.
original pred: 為 免 再 次 爆 發 內 [UNK]
final prediction 為免再次爆發內訌


 45%|████▍     | 1846/4131 [08:17<08:34,  4.44it/s]

found [UNK] in prediction.
original pred: [UNK] 船
final prediction 艚船


 53%|█████▎    | 2205/4131 [09:51<08:32,  3.76it/s]

found [UNK] in prediction.
original pred: 朱 載 [UNK]
final prediction 朱載堉


 55%|█████▍    | 2262/4131 [10:07<08:06,  3.84it/s]

found [UNK] in prediction.
original pred: 學 習 訓 [UNK] 學
final prediction 那裏學習訓


 58%|█████▊    | 2400/4131 [10:43<06:27,  4.47it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK] 的 禁 殺 之 旨
final prediction 朱允炆的禁殺之旨


 64%|██████▍   | 2662/4131 [11:52<06:45,  3.62it/s]

found [UNK] in prediction.
original pred: 李 端 [UNK]
final prediction 李端棻


 67%|██████▋   | 2752/4131 [12:15<05:45,  3.99it/s]

found [UNK] in prediction.
original pred: w. v. [UNK] 因
final prediction W.V.蒯因


 71%|███████   | 2937/4131 [13:06<06:48,  2.92it/s]

found [UNK] in prediction.
original pred: 拓 跋 [UNK]
final prediction 拓跋燾


 77%|███████▋  | 3168/4131 [14:05<02:55,  5.49it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 90%|████████▉ | 3717/4131 [16:12<01:15,  5.45it/s]

found [UNK] in prediction.
original pred: 拓 跋 [UNK]
final prediction 拓跋燾


 96%|█████████▋| 3982/4131 [17:11<00:32,  4.56it/s]

found [UNK] in prediction.
original pred: 學 習 訓 [UNK] 學
final prediction 學習訓詁學


100%|██████████| 4131/4131 [17:45<00:00,  3.88it/s]


Validation | Epoch 3 | acc = 0.795


  1%|          | 99/7923 [00:35<53:22,  2.44it/s]

Epoch 4 | Step 100 | loss = 0.080, acc = 0.950


  3%|▎         | 199/7923 [01:11<50:47,  2.53it/s]

Epoch 4 | Step 200 | loss = 0.105, acc = 0.962


  4%|▍         | 299/7923 [01:47<51:02,  2.49it/s]

Epoch 4 | Step 300 | loss = 0.127, acc = 0.938


  5%|▌         | 399/7923 [02:24<49:54,  2.51it/s]

Epoch 4 | Step 400 | loss = 0.118, acc = 0.955


  6%|▋         | 499/7923 [03:00<51:41,  2.39it/s]

Epoch 4 | Step 500 | loss = 0.066, acc = 0.967


  8%|▊         | 599/7923 [03:37<49:06,  2.49it/s]

Epoch 4 | Step 600 | loss = 0.074, acc = 0.952


  9%|▉         | 699/7923 [04:14<50:27,  2.39it/s]

Epoch 4 | Step 700 | loss = 0.096, acc = 0.962


 10%|█         | 799/7923 [04:51<48:27,  2.45it/s]

Epoch 4 | Step 800 | loss = 0.066, acc = 0.955


 11%|█▏        | 899/7923 [05:28<49:00,  2.39it/s]

Epoch 4 | Step 900 | loss = 0.078, acc = 0.967


 13%|█▎        | 999/7923 [06:05<46:18,  2.49it/s]

Epoch 4 | Step 1000 | loss = 0.103, acc = 0.947


 14%|█▍        | 1099/7923 [06:41<45:18,  2.51it/s]

Epoch 4 | Step 1100 | loss = 0.073, acc = 0.960


 15%|█▌        | 1199/7923 [07:18<46:07,  2.43it/s]

Epoch 4 | Step 1200 | loss = 0.103, acc = 0.952


 16%|█▋        | 1299/7923 [07:54<43:14,  2.55it/s]

Epoch 4 | Step 1300 | loss = 0.125, acc = 0.965


 18%|█▊        | 1399/7923 [08:30<44:24,  2.45it/s]

Epoch 4 | Step 1400 | loss = 0.140, acc = 0.955


 19%|█▉        | 1499/7923 [09:05<42:49,  2.50it/s]

Epoch 4 | Step 1500 | loss = 0.087, acc = 0.960


 20%|██        | 1599/7923 [09:41<41:47,  2.52it/s]

Epoch 4 | Step 1600 | loss = 0.084, acc = 0.952


 21%|██▏       | 1699/7923 [10:18<43:55,  2.36it/s]

Epoch 4 | Step 1700 | loss = 0.086, acc = 0.942


 23%|██▎       | 1799/7923 [10:54<41:34,  2.46it/s]

Epoch 4 | Step 1800 | loss = 0.105, acc = 0.947


 24%|██▍       | 1899/7923 [11:31<42:05,  2.38it/s]

Epoch 4 | Step 1900 | loss = 0.078, acc = 0.965


 25%|██▌       | 1999/7923 [12:08<39:51,  2.48it/s]

Epoch 4 | Step 2000 | loss = 0.130, acc = 0.955


 26%|██▋       | 2099/7923 [12:45<39:49,  2.44it/s]

Epoch 4 | Step 2100 | loss = 0.123, acc = 0.962


 28%|██▊       | 2199/7923 [13:22<38:35,  2.47it/s]

Epoch 4 | Step 2200 | loss = 0.078, acc = 0.950


 29%|██▉       | 2299/7923 [13:58<38:36,  2.43it/s]

Epoch 4 | Step 2300 | loss = 0.118, acc = 0.945


 30%|███       | 2399/7923 [14:35<37:34,  2.45it/s]

Epoch 4 | Step 2400 | loss = 0.082, acc = 0.950


 32%|███▏      | 2499/7923 [15:12<37:39,  2.40it/s]

Epoch 4 | Step 2500 | loss = 0.104, acc = 0.955


 33%|███▎      | 2599/7923 [15:48<35:48,  2.48it/s]

Epoch 4 | Step 2600 | loss = 0.094, acc = 0.940


 34%|███▍      | 2699/7923 [16:24<34:29,  2.52it/s]

Epoch 4 | Step 2700 | loss = 0.143, acc = 0.933


 35%|███▌      | 2799/7923 [17:00<34:07,  2.50it/s]

Epoch 4 | Step 2800 | loss = 0.089, acc = 0.942


 37%|███▋      | 2899/7923 [17:36<32:31,  2.57it/s]

Epoch 4 | Step 2900 | loss = 0.139, acc = 0.947


 38%|███▊      | 2999/7923 [18:12<33:26,  2.45it/s]

Epoch 4 | Step 3000 | loss = 0.109, acc = 0.960


 39%|███▉      | 3099/7923 [18:48<33:26,  2.40it/s]

Epoch 4 | Step 3100 | loss = 0.161, acc = 0.922


 40%|████      | 3199/7923 [19:25<32:09,  2.45it/s]

Epoch 4 | Step 3200 | loss = 0.093, acc = 0.950


 42%|████▏     | 3299/7923 [20:02<31:25,  2.45it/s]

Epoch 4 | Step 3300 | loss = 0.142, acc = 0.935


 43%|████▎     | 3399/7923 [20:39<30:45,  2.45it/s]

Epoch 4 | Step 3400 | loss = 0.130, acc = 0.940


 44%|████▍     | 3499/7923 [21:15<29:33,  2.49it/s]

Epoch 4 | Step 3500 | loss = 0.126, acc = 0.938


 45%|████▌     | 3599/7923 [21:52<30:06,  2.39it/s]

Epoch 4 | Step 3600 | loss = 0.073, acc = 0.957


 47%|████▋     | 3699/7923 [22:29<28:38,  2.46it/s]

Epoch 4 | Step 3700 | loss = 0.087, acc = 0.950


 48%|████▊     | 3799/7923 [23:06<27:30,  2.50it/s]

Epoch 4 | Step 3800 | loss = 0.119, acc = 0.950


 49%|████▉     | 3899/7923 [23:42<27:34,  2.43it/s]

Epoch 4 | Step 3900 | loss = 0.111, acc = 0.927


 50%|█████     | 3999/7923 [24:19<27:09,  2.41it/s]

Epoch 4 | Step 4000 | loss = 0.154, acc = 0.935


 52%|█████▏    | 4099/7923 [24:55<25:08,  2.53it/s]

Epoch 4 | Step 4100 | loss = 0.115, acc = 0.933


 53%|█████▎    | 4199/7923 [25:31<24:34,  2.53it/s]

Epoch 4 | Step 4200 | loss = 0.109, acc = 0.955


 54%|█████▍    | 4299/7923 [26:07<23:52,  2.53it/s]

Epoch 4 | Step 4300 | loss = 0.089, acc = 0.950


 56%|█████▌    | 4399/7923 [26:43<23:40,  2.48it/s]

Epoch 4 | Step 4400 | loss = 0.122, acc = 0.957


 57%|█████▋    | 4499/7923 [27:18<22:39,  2.52it/s]

Epoch 4 | Step 4500 | loss = 0.142, acc = 0.925


 58%|█████▊    | 4599/7923 [27:55<22:16,  2.49it/s]

Epoch 4 | Step 4600 | loss = 0.098, acc = 0.952


 59%|█████▉    | 4699/7923 [28:32<22:20,  2.40it/s]

Epoch 4 | Step 4700 | loss = 0.146, acc = 0.940


 61%|██████    | 4799/7923 [29:09<21:39,  2.40it/s]

Epoch 4 | Step 4800 | loss = 0.099, acc = 0.955


 62%|██████▏   | 4899/7923 [29:46<20:40,  2.44it/s]

Epoch 4 | Step 4900 | loss = 0.151, acc = 0.945


 63%|██████▎   | 4999/7923 [30:23<19:43,  2.47it/s]

Epoch 4 | Step 5000 | loss = 0.103, acc = 0.942


 64%|██████▍   | 5099/7923 [30:59<19:30,  2.41it/s]

Epoch 4 | Step 5100 | loss = 0.106, acc = 0.952


 66%|██████▌   | 5199/7923 [31:36<18:28,  2.46it/s]

Epoch 4 | Step 5200 | loss = 0.147, acc = 0.947


 67%|██████▋   | 5299/7923 [32:13<18:08,  2.41it/s]

Epoch 4 | Step 5300 | loss = 0.123, acc = 0.947


 68%|██████▊   | 5399/7923 [32:49<16:46,  2.51it/s]

Epoch 4 | Step 5400 | loss = 0.142, acc = 0.933


 69%|██████▉   | 5499/7923 [33:25<16:10,  2.50it/s]

Epoch 4 | Step 5500 | loss = 0.106, acc = 0.957


 71%|███████   | 5599/7923 [34:01<15:42,  2.47it/s]

Epoch 4 | Step 5600 | loss = 0.074, acc = 0.967


 72%|███████▏  | 5699/7923 [34:37<14:40,  2.53it/s]

Epoch 4 | Step 5700 | loss = 0.082, acc = 0.950


 73%|███████▎  | 5799/7923 [35:13<14:27,  2.45it/s]

Epoch 4 | Step 5800 | loss = 0.113, acc = 0.965


 74%|███████▍  | 5899/7923 [35:49<13:38,  2.47it/s]

Epoch 4 | Step 5900 | loss = 0.115, acc = 0.942


 76%|███████▌  | 5999/7923 [36:25<13:14,  2.42it/s]

Epoch 4 | Step 6000 | loss = 0.125, acc = 0.952


 77%|███████▋  | 6099/7923 [37:02<12:21,  2.46it/s]

Epoch 4 | Step 6100 | loss = 0.145, acc = 0.955


 78%|███████▊  | 6199/7923 [37:39<12:01,  2.39it/s]

Epoch 4 | Step 6200 | loss = 0.143, acc = 0.935


 80%|███████▉  | 6299/7923 [38:16<11:13,  2.41it/s]

Epoch 4 | Step 6300 | loss = 0.103, acc = 0.962


 81%|████████  | 6399/7923 [38:53<10:28,  2.43it/s]

Epoch 4 | Step 6400 | loss = 0.103, acc = 0.945


 82%|████████▏ | 6499/7923 [39:30<09:56,  2.39it/s]

Epoch 4 | Step 6500 | loss = 0.068, acc = 0.957


 83%|████████▎ | 6599/7923 [40:06<08:59,  2.45it/s]

Epoch 4 | Step 6600 | loss = 0.149, acc = 0.927


 85%|████████▍ | 6699/7923 [40:43<08:21,  2.44it/s]

Epoch 4 | Step 6700 | loss = 0.059, acc = 0.957


 86%|████████▌ | 6799/7923 [41:19<07:32,  2.48it/s]

Epoch 4 | Step 6800 | loss = 0.083, acc = 0.957


 87%|████████▋ | 6899/7923 [41:56<06:54,  2.47it/s]

Epoch 4 | Step 6900 | loss = 0.153, acc = 0.955


 88%|████████▊ | 6999/7923 [42:31<06:10,  2.49it/s]

Epoch 4 | Step 7000 | loss = 0.106, acc = 0.947


 90%|████████▉ | 7099/7923 [43:07<05:26,  2.52it/s]

Epoch 4 | Step 7100 | loss = 0.082, acc = 0.952


 91%|█████████ | 7199/7923 [43:43<04:53,  2.46it/s]

Epoch 4 | Step 7200 | loss = 0.133, acc = 0.933


 92%|█████████▏| 7299/7923 [44:20<04:10,  2.49it/s]

Epoch 4 | Step 7300 | loss = 0.123, acc = 0.955


 93%|█████████▎| 7399/7923 [44:56<03:32,  2.47it/s]

Epoch 4 | Step 7400 | loss = 0.182, acc = 0.935


 95%|█████████▍| 7499/7923 [45:33<02:52,  2.45it/s]

Epoch 4 | Step 7500 | loss = 0.118, acc = 0.933


 96%|█████████▌| 7599/7923 [46:10<02:09,  2.50it/s]

Epoch 4 | Step 7600 | loss = 0.114, acc = 0.952


 97%|█████████▋| 7699/7923 [46:47<01:28,  2.54it/s]

Epoch 4 | Step 7700 | loss = 0.084, acc = 0.947


 98%|█████████▊| 7799/7923 [47:23<00:49,  2.50it/s]

Epoch 4 | Step 7800 | loss = 0.116, acc = 0.940


100%|█████████▉| 7899/7923 [48:00<00:09,  2.46it/s]

Epoch 4 | Step 7900 | loss = 0.235, acc = 0.920


100%|██████████| 7923/7923 [48:09<00:00,  2.74it/s]


Evaluating Dev Set ...


  6%|▌         | 228/4131 [01:01<18:17,  3.56it/s]

found [UNK] in prediction.
original pred: 李 [UNK]
final prediction 李杲


  9%|▉         | 390/4131 [01:45<16:36,  3.75it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK]
final prediction 朱允炆


 10%|▉         | 393/4131 [01:46<14:33,  4.28it/s]

found [UNK] in prediction.
original pred: [UNK] 崎 八 幡 宮
final prediction 筥崎八幡宮


 12%|█▏        | 504/4131 [02:15<14:20,  4.22it/s]

found [UNK] in prediction.
original pred: 與 慕 容 [UNK] 雙 方 不 和
final prediction 與慕容廆雙方不和


 17%|█▋        | 700/4131 [03:07<14:42,  3.89it/s]

found [UNK] in prediction.
original pred: 對 日 本 報 紙 的 無 恥 造 謠 誣 [UNK] ， 進 行 了 有 力 駁 斥
final prediction 對日本報紙的無恥造謠誣衊，進行了有力駁斥


 20%|██        | 842/4131 [03:44<12:34,  4.36it/s]

found [UNK] in prediction.
original pred: 杜 恆 - [UNK] 因 論 題
final prediction 杜恆-蒯因論題


 24%|██▍       | 984/4131 [04:22<14:01,  3.74it/s]

found [UNK] in prediction.
original pred: 青 翁 三 足 [UNK]
final prediction 青翁三足缶


 34%|███▎      | 1384/4131 [06:09<11:16,  4.06it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 34%|███▍      | 1401/4131 [06:14<13:42,  3.32it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK]
final prediction 朱允炆


 37%|███▋      | 1512/4131 [06:44<11:44,  3.72it/s]

found [UNK] in prediction.
original pred: 為 免 再 次 爆 發 內 [UNK]
final prediction 為免再次爆發內訌


 45%|████▍     | 1846/4131 [08:16<08:39,  4.40it/s]

found [UNK] in prediction.
original pred: [UNK] 船
final prediction 艚船


 50%|█████     | 2081/4131 [09:19<10:13,  3.34it/s]

found [UNK] in prediction.
original pred: [UNK] 神 星
final prediction 鬩神星


 53%|█████▎    | 2205/4131 [09:51<08:28,  3.79it/s]

found [UNK] in prediction.
original pred: 朱 載 [UNK]
final prediction 朱載堉


 55%|█████▍    | 2262/4131 [10:07<08:29,  3.67it/s]

found [UNK] in prediction.
original pred: 學 習 訓 [UNK] 學
final prediction 那裏學習訓


 58%|█████▊    | 2399/4131 [10:43<07:08,  4.04it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK] 的 禁 殺 之 旨
final prediction 朱允炆的禁殺之旨


 64%|██████▍   | 2662/4131 [11:52<06:35,  3.71it/s]

found [UNK] in prediction.
original pred: 李 端 [UNK]
final prediction 李端棻


 67%|██████▋   | 2752/4131 [12:14<05:44,  4.00it/s]

found [UNK] in prediction.
original pred: w. v. [UNK] 因
final prediction W.V.蒯因


 71%|███████   | 2937/4131 [13:03<06:37,  3.01it/s]

found [UNK] in prediction.
original pred: 拓 跋 [UNK]
final prediction 拓跋燾


 77%|███████▋  | 3167/4131 [14:01<02:52,  5.58it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 77%|███████▋  | 3168/4131 [14:02<02:53,  5.55it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 90%|█████████ | 3718/4131 [16:09<01:17,  5.35it/s]

found [UNK] in prediction.
original pred: 拓 跋 [UNK]
final prediction 拓跋燾


 96%|█████████▋| 3982/4131 [17:08<00:35,  4.20it/s]

found [UNK] in prediction.
original pred: 到 霍 山 陳 梅 婷 那 裏 學 習 訓 [UNK] 學
final prediction 到霍山陳梅婷那裏學習訓詁學


100%|██████████| 4131/4131 [17:43<00:00,  3.88it/s]


Validation | Epoch 4 | acc = 0.792


  1%|          | 99/7923 [00:36<51:34,  2.53it/s]

Epoch 5 | Step 100 | loss = 0.084, acc = 0.952


  3%|▎         | 199/7923 [01:12<52:38,  2.45it/s]

Epoch 5 | Step 200 | loss = 0.059, acc = 0.967


  4%|▍         | 299/7923 [01:48<52:59,  2.40it/s]

Epoch 5 | Step 300 | loss = 0.076, acc = 0.962


  5%|▌         | 399/7923 [02:24<49:53,  2.51it/s]

Epoch 5 | Step 400 | loss = 0.050, acc = 0.975


  6%|▋         | 499/7923 [03:00<49:15,  2.51it/s]

Epoch 5 | Step 500 | loss = 0.093, acc = 0.957


  8%|▊         | 599/7923 [03:37<50:11,  2.43it/s]

Epoch 5 | Step 600 | loss = 0.077, acc = 0.967


  9%|▉         | 699/7923 [04:13<49:05,  2.45it/s]

Epoch 5 | Step 700 | loss = 0.050, acc = 0.962


 10%|█         | 799/7923 [04:50<48:14,  2.46it/s]

Epoch 5 | Step 800 | loss = 0.080, acc = 0.977


 11%|█▏        | 899/7923 [05:27<47:24,  2.47it/s]

Epoch 5 | Step 900 | loss = 0.073, acc = 0.977


 13%|█▎        | 999/7923 [06:04<46:36,  2.48it/s]

Epoch 5 | Step 1000 | loss = 0.057, acc = 0.975


 14%|█▍        | 1099/7923 [06:40<44:37,  2.55it/s]

Epoch 5 | Step 1100 | loss = 0.077, acc = 0.970


 15%|█▌        | 1199/7923 [07:16<44:15,  2.53it/s]

Epoch 5 | Step 1200 | loss = 0.047, acc = 0.975


 16%|█▋        | 1299/7923 [07:53<44:55,  2.46it/s]

Epoch 5 | Step 1300 | loss = 0.059, acc = 0.967


 18%|█▊        | 1399/7923 [08:29<44:44,  2.43it/s]

Epoch 5 | Step 1400 | loss = 0.121, acc = 0.960


 19%|█▉        | 1499/7923 [09:04<43:09,  2.48it/s]

Epoch 5 | Step 1500 | loss = 0.113, acc = 0.950


 20%|██        | 1599/7923 [09:41<43:15,  2.44it/s]

Epoch 5 | Step 1600 | loss = 0.102, acc = 0.972


 21%|██▏       | 1699/7923 [10:17<41:46,  2.48it/s]

Epoch 5 | Step 1700 | loss = 0.070, acc = 0.967


 23%|██▎       | 1799/7923 [10:54<41:12,  2.48it/s]

Epoch 5 | Step 1800 | loss = 0.078, acc = 0.960


 24%|██▍       | 1899/7923 [11:30<41:01,  2.45it/s]

Epoch 5 | Step 1900 | loss = 0.082, acc = 0.960


 25%|██▌       | 1999/7923 [12:07<42:24,  2.33it/s]

Epoch 5 | Step 2000 | loss = 0.053, acc = 0.980


 26%|██▋       | 2099/7923 [12:44<40:28,  2.40it/s]

Epoch 5 | Step 2100 | loss = 0.131, acc = 0.955


 28%|██▊       | 2199/7923 [13:21<39:16,  2.43it/s]

Epoch 5 | Step 2200 | loss = 0.063, acc = 0.967


 29%|██▉       | 2299/7923 [13:57<37:55,  2.47it/s]

Epoch 5 | Step 2300 | loss = 0.092, acc = 0.962


 30%|███       | 2399/7923 [14:34<38:03,  2.42it/s]

Epoch 5 | Step 2400 | loss = 0.067, acc = 0.967


 32%|███▏      | 2499/7923 [15:09<35:44,  2.53it/s]

Epoch 5 | Step 2500 | loss = 0.075, acc = 0.965


 33%|███▎      | 2599/7923 [15:45<35:42,  2.48it/s]

Epoch 5 | Step 2600 | loss = 0.051, acc = 0.970


 34%|███▍      | 2699/7923 [16:21<34:48,  2.50it/s]

Epoch 5 | Step 2700 | loss = 0.056, acc = 0.977


 35%|███▌      | 2799/7923 [16:57<35:23,  2.41it/s]

Epoch 5 | Step 2800 | loss = 0.075, acc = 0.965


 37%|███▋      | 2899/7923 [17:34<33:24,  2.51it/s]

Epoch 5 | Step 2900 | loss = 0.059, acc = 0.975


 38%|███▊      | 2999/7923 [18:11<33:01,  2.49it/s]

Epoch 5 | Step 3000 | loss = 0.080, acc = 0.970


 39%|███▉      | 3099/7923 [18:48<32:30,  2.47it/s]

Epoch 5 | Step 3100 | loss = 0.045, acc = 0.972


 40%|████      | 3199/7923 [19:24<32:58,  2.39it/s]

Epoch 5 | Step 3200 | loss = 0.046, acc = 0.965


 42%|████▏     | 3299/7923 [20:01<31:46,  2.43it/s]

Epoch 5 | Step 3300 | loss = 0.078, acc = 0.967


 43%|████▎     | 3399/7923 [20:38<30:50,  2.44it/s]

Epoch 5 | Step 3400 | loss = 0.071, acc = 0.967


 44%|████▍     | 3499/7923 [21:15<30:05,  2.45it/s]

Epoch 5 | Step 3500 | loss = 0.063, acc = 0.970


 45%|████▌     | 3599/7923 [21:51<29:28,  2.45it/s]

Epoch 5 | Step 3600 | loss = 0.100, acc = 0.967


 47%|████▋     | 3699/7923 [22:27<28:51,  2.44it/s]

Epoch 5 | Step 3700 | loss = 0.100, acc = 0.967


 48%|████▊     | 3799/7923 [23:03<28:30,  2.41it/s]

Epoch 5 | Step 3800 | loss = 0.093, acc = 0.955


 49%|████▉     | 3899/7923 [23:39<26:34,  2.52it/s]

Epoch 5 | Step 3900 | loss = 0.052, acc = 0.972


 50%|█████     | 3999/7923 [24:15<26:40,  2.45it/s]

Epoch 5 | Step 4000 | loss = 0.079, acc = 0.967


 52%|█████▏    | 4099/7923 [24:51<25:53,  2.46it/s]

Epoch 5 | Step 4100 | loss = 0.084, acc = 0.970


 53%|█████▎    | 4199/7923 [25:28<25:38,  2.42it/s]

Epoch 5 | Step 4200 | loss = 0.099, acc = 0.957


 54%|█████▍    | 4299/7923 [26:04<24:00,  2.52it/s]

Epoch 5 | Step 4300 | loss = 0.110, acc = 0.965


 56%|█████▌    | 4399/7923 [26:41<24:20,  2.41it/s]

Epoch 5 | Step 4400 | loss = 0.066, acc = 0.960


 57%|█████▋    | 4499/7923 [27:18<23:30,  2.43it/s]

Epoch 5 | Step 4500 | loss = 0.114, acc = 0.965


 58%|█████▊    | 4599/7923 [27:55<22:50,  2.42it/s]

Epoch 5 | Step 4600 | loss = 0.097, acc = 0.962


 59%|█████▉    | 4699/7923 [28:32<22:16,  2.41it/s]

Epoch 5 | Step 4700 | loss = 0.027, acc = 0.982


 61%|██████    | 4799/7923 [29:08<21:44,  2.40it/s]

Epoch 5 | Step 4800 | loss = 0.110, acc = 0.965


 62%|██████▏   | 4899/7923 [29:45<20:39,  2.44it/s]

Epoch 5 | Step 4900 | loss = 0.060, acc = 0.960


 63%|██████▎   | 4999/7923 [30:21<19:25,  2.51it/s]

Epoch 5 | Step 5000 | loss = 0.053, acc = 0.965


 64%|██████▍   | 5099/7923 [30:57<18:23,  2.56it/s]

Epoch 5 | Step 5100 | loss = 0.082, acc = 0.952


 66%|██████▌   | 5199/7923 [31:33<18:47,  2.42it/s]

Epoch 5 | Step 5200 | loss = 0.119, acc = 0.960


 67%|██████▋   | 5299/7923 [32:09<17:06,  2.56it/s]

Epoch 5 | Step 5300 | loss = 0.069, acc = 0.970


 68%|██████▊   | 5399/7923 [32:45<17:31,  2.40it/s]

Epoch 5 | Step 5400 | loss = 0.077, acc = 0.960


 69%|██████▉   | 5499/7923 [33:22<16:11,  2.50it/s]

Epoch 5 | Step 5500 | loss = 0.085, acc = 0.955


 71%|███████   | 5599/7923 [33:58<15:48,  2.45it/s]

Epoch 5 | Step 5600 | loss = 0.146, acc = 0.952


 72%|███████▏  | 5699/7923 [34:35<14:57,  2.48it/s]

Epoch 5 | Step 5700 | loss = 0.124, acc = 0.955


 73%|███████▎  | 5799/7923 [35:12<14:34,  2.43it/s]

Epoch 5 | Step 5800 | loss = 0.048, acc = 0.977


 74%|███████▍  | 5899/7923 [35:48<13:40,  2.47it/s]

Epoch 5 | Step 5900 | loss = 0.044, acc = 0.975


 76%|███████▌  | 5999/7923 [36:25<13:10,  2.43it/s]

Epoch 5 | Step 6000 | loss = 0.084, acc = 0.975


 77%|███████▋  | 6099/7923 [37:02<12:21,  2.46it/s]

Epoch 5 | Step 6100 | loss = 0.127, acc = 0.947


 78%|███████▊  | 6199/7923 [37:39<11:25,  2.52it/s]

Epoch 5 | Step 6200 | loss = 0.077, acc = 0.962


 80%|███████▉  | 6299/7923 [38:16<11:22,  2.38it/s]

Epoch 5 | Step 6300 | loss = 0.105, acc = 0.945


 81%|████████  | 6399/7923 [38:52<10:26,  2.43it/s]

Epoch 5 | Step 6400 | loss = 0.110, acc = 0.947


 82%|████████▏ | 6499/7923 [39:29<09:42,  2.44it/s]

Epoch 5 | Step 6500 | loss = 0.108, acc = 0.955


 83%|████████▎ | 6599/7923 [40:06<09:01,  2.45it/s]

Epoch 5 | Step 6600 | loss = 0.107, acc = 0.960


 85%|████████▍ | 6699/7923 [40:43<08:34,  2.38it/s]

Epoch 5 | Step 6700 | loss = 0.115, acc = 0.955


 86%|████████▌ | 6799/7923 [41:19<07:39,  2.45it/s]

Epoch 5 | Step 6800 | loss = 0.126, acc = 0.962


 87%|████████▋ | 6899/7923 [41:56<06:45,  2.52it/s]

Epoch 5 | Step 6900 | loss = 0.051, acc = 0.977


 88%|████████▊ | 6999/7923 [42:32<06:14,  2.47it/s]

Epoch 5 | Step 7000 | loss = 0.126, acc = 0.957


 90%|████████▉ | 7099/7923 [43:09<05:49,  2.36it/s]

Epoch 5 | Step 7100 | loss = 0.050, acc = 0.972


 91%|█████████ | 7199/7923 [43:45<04:56,  2.44it/s]

Epoch 5 | Step 7200 | loss = 0.059, acc = 0.972


 92%|█████████▏| 7299/7923 [44:21<04:08,  2.52it/s]

Epoch 5 | Step 7300 | loss = 0.084, acc = 0.957


 93%|█████████▎| 7399/7923 [44:57<03:26,  2.53it/s]

Epoch 5 | Step 7400 | loss = 0.119, acc = 0.947


 95%|█████████▍| 7499/7923 [45:33<02:52,  2.46it/s]

Epoch 5 | Step 7500 | loss = 0.099, acc = 0.955


 96%|█████████▌| 7599/7923 [46:08<02:08,  2.52it/s]

Epoch 5 | Step 7600 | loss = 0.106, acc = 0.952


 97%|█████████▋| 7699/7923 [46:44<01:29,  2.50it/s]

Epoch 5 | Step 7700 | loss = 0.078, acc = 0.970


 98%|█████████▊| 7799/7923 [47:20<00:50,  2.45it/s]

Epoch 5 | Step 7800 | loss = 0.081, acc = 0.965


100%|█████████▉| 7899/7923 [47:57<00:09,  2.41it/s]

Epoch 5 | Step 7900 | loss = 0.108, acc = 0.957


100%|██████████| 7923/7923 [48:06<00:00,  2.75it/s]


Evaluating Dev Set ...


  6%|▌         | 228/4131 [01:02<18:18,  3.55it/s]

found [UNK] in prediction.
original pred: 李 [UNK]
final prediction 李杲


  9%|▉         | 390/4131 [01:46<16:46,  3.72it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK]
final prediction 朱允炆


 10%|▉         | 393/4131 [01:47<14:50,  4.20it/s]

found [UNK] in prediction.
original pred: [UNK] 崎 八 幡 宮
final prediction 筥崎八幡宮


 12%|█▏        | 504/4131 [02:17<15:08,  3.99it/s]

found [UNK] in prediction.
original pred: 與 慕 容 [UNK] 雙 方 不 和
final prediction 與慕容廆雙方不和


 20%|██        | 842/4131 [03:48<13:07,  4.17it/s]

found [UNK] in prediction.
original pred: 杜 恆 - [UNK] 因 論 題
final prediction 杜恆-蒯因論題


 24%|██▍       | 984/4131 [04:27<13:59,  3.75it/s]

found [UNK] in prediction.
original pred: 青 翁 三 足 [UNK]
final prediction 青翁三足缶


 34%|███▎      | 1384/4131 [06:14<11:02,  4.15it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 34%|███▍      | 1401/4131 [06:19<12:59,  3.50it/s]

found [UNK] in prediction.
original pred: 朱 允 [UNK]
final prediction 朱允炆


 35%|███▍      | 1443/4131 [06:31<12:48,  3.50it/s]

found [UNK] in prediction.
original pred: 劉 [UNK]
final prediction 劉炟


 37%|███▋      | 1512/4131 [06:49<11:36,  3.76it/s]

found [UNK] in prediction.
original pred: 免 再 次 爆 發 內 [UNK]
final prediction 免再次爆發內訌


 45%|████▍     | 1846/4131 [08:18<08:15,  4.61it/s]

found [UNK] in prediction.
original pred: [UNK] 船
final prediction 艚船


 53%|█████▎    | 2205/4131 [09:51<08:31,  3.77it/s]

found [UNK] in prediction.
original pred: 朱 載 [UNK]
final prediction 朱載堉


 55%|█████▍    | 2263/4131 [10:07<07:33,  4.12it/s]

found [UNK] in prediction.
original pred: 學 習 訓 [UNK] 學
final prediction 那裏學習訓


 64%|██████▍   | 2662/4131 [11:54<06:56,  3.53it/s]

found [UNK] in prediction.
original pred: 李 端 [UNK]
final prediction 李端棻


 67%|██████▋   | 2752/4131 [12:18<05:54,  3.89it/s]

found [UNK] in prediction.
original pred: w. v. [UNK] 因
final prediction W.V.蒯因


 71%|███████   | 2937/4131 [13:08<06:54,  2.88it/s]

found [UNK] in prediction.
original pred: 拓 跋 [UNK]
final prediction 拓跋燾


 77%|███████▋  | 3168/4131 [14:08<03:02,  5.28it/s]

found [UNK] in prediction.
original pred: [UNK] [UNK]
final prediction 蛞蝓


 90%|█████████ | 3718/4131 [16:14<01:15,  5.45it/s]

found [UNK] in prediction.
original pred: 拓 跋 [UNK]
final prediction 拓跋燾


100%|██████████| 4131/4131 [17:46<00:00,  3.87it/s]


Validation | Epoch 5 | acc = 0.793
Saving Model ...


## Testing

In [12]:
print("Evaluating Test Set ...")

result = []

model.eval()
with torch.no_grad():
    for i, data in enumerate(tqdm(test_loader)):
        output = model(input_ids=data[0].squeeze(dim=0).to(device), token_type_ids=data[1].squeeze(dim=0).to(device),
                       attention_mask=data[2].squeeze(dim=0).to(device))
        result.append(evaluate(data, output, doc_stride, test_paragraphs[test_questions[i]['paragraph_id']],
                               test_paragraphs_tokenized[test_questions[i]['paragraph_id']].tokens))

result_file = "result04232.csv"
with open(result_file, 'w') as f:	
	  f.write("ID,Answer\n")
	  for i, test_question in enumerate(test_questions):
        # Replace commas in answers with empty strings (since csv is separated by comma)
        # Answers in kaggle are processed in the same way
		    f.write(f"{test_question['id']},{result[i].replace(',','')}\n")

print(f"Completed! Result is in {result_file}")

Evaluating Test Set ...


  5%|▌         | 252/4957 [01:06<20:04,  3.91it/s]

found [UNK] in prediction.
original pred: 溥 [UNK]
final prediction 溥儁


 11%|█▏        | 564/4957 [02:23<15:08,  4.84it/s]

found [UNK] in prediction.
original pred: 馬 [UNK]
final prediction 馬馼


 13%|█▎        | 636/4957 [02:43<19:30,  3.69it/s]

found [UNK] in prediction.
original pred: 東 晉 常 [UNK]
final prediction ，東晉常


 19%|█▉        | 939/4957 [04:03<15:15,  4.39it/s]

found [UNK] in prediction.
original pred: [UNK] 稻
final prediction 秈稻


 20%|██        | 992/4957 [04:16<15:07,  4.37it/s]

found [UNK] in prediction.
original pred: 白 [UNK] 紀 滅 絕 事 件
final prediction 白堊紀滅絕事件


 29%|██▊       | 1424/4957 [06:09<13:23,  4.40it/s]

found [UNK] in prediction.
original pred: 抗 佝 [UNK] 病 維 他 命
final prediction 抗佝僂病維他命


 31%|███       | 1535/4957 [06:39<14:01,  4.07it/s]

found [UNK] in prediction.
original pred: 杭 州 [UNK] 橋 機 場
final prediction 襲杭州筧橋機


 31%|███▏      | 1558/4957 [06:45<12:47,  4.43it/s]

found [UNK] in prediction.
original pred: 蔡 [UNK]
final prediction 蔡鍔


 34%|███▍      | 1681/4957 [07:16<13:59,  3.90it/s]

found [UNK] in prediction.
original pred: 丁 [UNK]
final prediction 丁旿


 35%|███▌      | 1751/4957 [07:35<15:33,  3.43it/s]

found [UNK] in prediction.
original pred: 隋 [UNK] 帝
final prediction 隋煬帝


 39%|███▊      | 1914/4957 [08:17<15:35,  3.25it/s]

found [UNK] in prediction.
original pred: 胡 季 [UNK]
final prediction 。其中


 41%|████      | 2043/4957 [08:51<11:12,  4.34it/s]

found [UNK] in prediction.
original pred: 其 英 文 縮 寫 首 字 母 為 「 [UNK] · ㄎㄟ · ㄨㄞ 」
final prediction 其英文縮寫首字母為「ㄟㄙ·ㄎㄟ·ㄨㄞ


 48%|████▊     | 2400/4957 [10:23<10:41,  3.98it/s]

found [UNK] in prediction.
original pred: 梁 [UNK]
final prediction 梁鵠


 51%|█████     | 2514/4957 [10:52<09:14,  4.41it/s]

found [UNK] in prediction.
original pred: [UNK] 靼 海 峽
final prediction 韃靼海峽


 53%|█████▎    | 2621/4957 [11:21<09:24,  4.14it/s]

found [UNK] in prediction.
original pred: 白 [UNK] 紀 末 滅 絕 事 件
final prediction 白堊紀末滅絕事件


 56%|█████▌    | 2769/4957 [12:00<08:46,  4.16it/s]

found [UNK] in prediction.
original pred: 侏 [UNK] 紀
final prediction 侏儸紀


 61%|██████    | 3027/4957 [13:08<09:15,  3.47it/s]

found [UNK] in prediction.
original pred: 克 里 米 亞 [UNK] 靼 人
final prediction 克里米亞韃靼人


 68%|██████▊   | 3370/4957 [14:37<06:03,  4.36it/s]

found [UNK] in prediction.
original pred: 白 [UNK] 紀 中 期
final prediction 白堊紀中期


 73%|███████▎  | 3604/4957 [15:31<04:05,  5.51it/s]

found [UNK] in prediction.
original pred: 白 [UNK] 紀
final prediction 白堊紀


 73%|███████▎  | 3608/4957 [15:32<03:48,  5.91it/s]

found [UNK] in prediction.
original pred: 白 [UNK] 紀
final prediction 白堊紀


100%|██████████| 4957/4957 [21:19<00:00,  3.87it/s]

Completed! Result is in result04232.csv



