<a href="https://colab.research.google.com/github/dotsnangles/NMT-with-transformers/blob/master/training_mT5-small.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!nvidia-smi

Sat Aug  6 14:25:55 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   28C    P8    15W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+---------------------------------------------------------------------------

### Set notebook parameters

In [2]:
run_name = 'ko2en run on aws ec2 with NVIDIA A10G with the domain data'

project_name = 'ko2en-translator-mt5-small-with-the-domain-data'

num_train_epochs = 30
batch_size = 8
gradient_accumulation_steps = 2

learning_rate = 2e-5
weight_decay = 0.01

lr_scheduler_type = 'cosine'
warmup_ratio = 0.1

predict_with_generate = False
generation_max_length = 256

# early_stopping_patience = 5
save_total_limit = 5

load_best_model_at_end = False
metric_for_best_model='eval_loss'

save_strategy = "epoch"
evaluation_strategy = "epoch"
# save_steps = 1250
# eval_steps = 1250

logging_strategy = "steps"
logging_first_step = True 
logging_steps = 500

fp16 = False

### Prerequisites

In [3]:
# !conda install -c conda-forge datasets transformers sentencepiece sacrebleu folium wandb pandas gdown jupyterlab ipywidgets papermill

In [4]:
# import gdown
# id = "1J21-T8wYjlj-91CxtxEzrcE34CDt7CM3"
# gdown.download_folder(id=id, quiet=True, use_cookies=False)

### Set WandB 

In [5]:
%env WANDB_NOTEBOOK_NAME=/home/ubuntu/codes/NMT-with-transformers/training_mT5_small_domain_ko2en.ipynb
%env WANDB_PROJECT=$project_name
%env WANDB_LOG_MODEL=true
%env WANDB_WATCH=all

env: WANDB_NOTEBOOK_NAME=/home/ubuntu/codes/NMT-with-transformers/training_mT5_small_domain_ko2en.ipynb
env: WANDB_PROJECT=ko2en-translator-mt5-small-with-the-domain-data
env: WANDB_LOG_MODEL=true
env: WANDB_WATCH=all


In [6]:
import wandb
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mdotsnangles[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

### Model Selection

In [7]:
model_ckpt = 'google/mt5-small'

### Import stuff

In [8]:
import pandas as pd
from datasets import Dataset, load_metric
from transformers import AutoConfig, AutoTokenizer, AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer, EarlyStoppingCallback

In [9]:
train_df = pd.read_csv('./data/train_domain.csv')
val_df = pd.read_csv('./data/val_domain.csv')

In [10]:
tokenizer = AutoTokenizer.from_pretrained(model_ckpt, use_fast=False)

### Measure token length

In [11]:
def measure_len(sample):
    return len(tokenizer.encode(sample))

In [12]:
src_prefix = "translate Korean to English: "

print('length of src_prefix:', measure_len(src_prefix))
print(tokenizer.encode(src_prefix))
with tokenizer.as_target_tokenizer():
    print(tokenizer.encode(src_prefix))

length of src_prefix: 7
[37194, 259, 37209, 288, 5413, 267, 1]
[37194, 259, 37209, 288, 5413, 267, 1]


In [13]:
# train_df_ko_len = train_df['ko'].apply(measure_len)
# train_df_en_len = train_df['en'].apply(measure_len)
# val_df_ko_len = val_df['ko'].apply(measure_len)
# val_df_en_len = val_df['en'].apply(measure_len)

In [14]:
# max(train_df_ko_len)+7, max(train_df_en_len),  max(val_df_ko_len)+7, max(val_df_en_len)

### df to ds

In [15]:
train_ds = Dataset.from_pandas(train_df[['ko', 'en']])
val_ds = Dataset.from_pandas(val_df[['ko', 'en']])
# .shuffle(seed=42)[:val_ds_len]
# val_ds = Dataset.from_dict(val_ds)
train_ds, val_ds

(Dataset({
     features: ['ko', 'en'],
     num_rows: 319551
 }),
 Dataset({
     features: ['ko', 'en'],
     num_rows: 40359
 }))

In [16]:
idx = 0
for e in train_ds:
    print(e)
    idx += 1
    if idx == 2:
        break

{'ko': '비교기(1235 및 1237)는 설정에 따라 Relu 활성함수로 나타낼 수 있으며, 시그모이드 함수로 나타낼 수도 있다.', 'en': 'The comparators 1235 and 1237 may be expressed as a Relu activation function or a sigmoid function according to a setting.'}
{'ko': '서버(320)는 분석 모델 DB(325)에 소스 영상을 입력하고, 학습 모델에서 출력하는 객체 정보를 수신할 수 있다.', 'en': 'The server 320 may input a source image to the analysis model DB 325 and receive object information output from the training model.'}


In [17]:
idx = 0
for e in val_ds:
    print(e)
    idx += 1
    if idx == 2:
        break

{'ko': '상기 제1 지점 및 상기 제2 지점은 표시 장치(160) 내에 위치하는 서로 상이한 지점들을 포함할 수 있다.', 'en': 'The first point and the second point may include different points located in the display device 160.'}
{'ko': '시선 추적부(206)가 딥러닝 모델(210)을 이용하여 사용자의 시선을 보다 정확히 추적하기 위해서는 딥러닝 모델(210)의 학습용 데이터, 즉 시선 추적을 위한 학습 데이터의 신뢰도가 높아야 한다.', 'en': "In order for the eye tracking unit 206 to more accurately track the user's gaze using the deep learning model 210, the learning data for the deep learning model 210, that is, the training data for eye tracking, should have high reliability."}


### Preprocess

In [18]:
source_lang = "ko"
target_lang = "en"
prefix = "translate Korean to English: "

def preprocess_function(examples):
    inputs = [prefix + example for example in examples[source_lang]]
    targets = [example for example in examples[target_lang]]
    model_inputs = tokenizer(inputs, max_length=162, truncation=True)

    with tokenizer.as_target_tokenizer():
        labels = tokenizer(targets, max_length=122, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

#### Test preprocess_function

In [19]:
train_ds[:3]

{'ko': ['비교기(1235 및 1237)는 설정에 따라 Relu 활성함수로 나타낼 수 있으며, 시그모이드 함수로 나타낼 수도 있다.',
  '서버(320)는 분석 모델 DB(325)에 소스 영상을 입력하고, 학습 모델에서 출력하는 객체 정보를 수신할 수 있다.',
  '상기 회전을 결정하는 단계는 상기 얼굴 이미지에 포함된 눈의 모양에 기초하여 상기 디스플레이 영상의 회전을 결정할 수 있다.'],
 'en': ['The comparators 1235 and 1237 may be expressed as a Relu activation function or a sigmoid function according to a setting.',
  'The server 320 may input a source image to the analysis model DB 325 and receive object information output from the training model.',
  'The step of determining the rotation may include determining the rotation of the display image based on a shape of an eye included in the face image.']}

In [20]:
preprocess_test = preprocess_function(train_ds[:3])
print('input id', preprocess_test.input_ids[0])
print(tokenizer.decode(preprocess_test.input_ids[0]), '\n')
print('attention mask', preprocess_test.attention_mask[0], '\n')
print('label', preprocess_test.labels[0])
print(tokenizer.decode(preprocess_test.labels[0]))

input id [37194, 259, 37209, 288, 5413, 267, 259, 53789, 1622, 312, 175510, 259, 5593, 644, 101294, 988, 30957, 118645, 259, 18490, 788, 1696, 63019, 3353, 12482, 2277, 1235, 49303, 125462, 1566, 3083, 19023, 261, 6463, 11051, 6763, 63362, 15331, 2277, 1235, 49303, 125462, 259, 44830, 3632, 260, 1]
translate Korean to English: 비교기(1235 및 1237)는 설정에 따라 Relu 활성함수로 나타낼 수 있으며, 시그모이드 함수로 나타낼 수도 있다.</s> 

attention mask [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] 

label [486, 39959, 19002, 259, 175510, 305, 259, 162249, 1432, 390, 17385, 345, 527, 259, 262, 788, 1696, 259, 97359, 2835, 631, 259, 262, 2002, 1233, 525, 2835, 259, 18775, 288, 259, 262, 36577, 260, 1]
The comparators 1235 and 1237 may be expressed as a Relu activation function or a sigmoid function according to a setting.</s>


In [21]:
tokenized_train = train_ds.map(preprocess_function, batched=True)
tokenized_val = val_ds.map(preprocess_function, batched=True)
tokenized_train, tokenized_val

  0%|          | 0/320 [00:00<?, ?ba/s]

  0%|          | 0/41 [00:00<?, ?ba/s]

(Dataset({
     features: ['ko', 'en', 'input_ids', 'attention_mask', 'labels'],
     num_rows: 319551
 }),
 Dataset({
     features: ['ko', 'en', 'input_ids', 'attention_mask', 'labels'],
     num_rows: 40359
 }))

### Load metric

In [22]:
metric = load_metric("sacrebleu")

In [23]:
import numpy as np

def postprocess_text(preds, labels):
    preds = [pred.strip() for pred in preds]
    labels = [[label.strip()] for label in labels]

    return preds, labels

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    if isinstance(preds, tuple):
        preds = preds[0]
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # Some simple post-processing
    decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)

    result = metric.compute(predictions=decoded_preds, references=decoded_labels)
    result = {"bleu": result["score"]}

    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
    result["gen_len"] = np.mean(prediction_lens)
    result = {k: round(v, 4) for k, v in result.items()}
    return result

### Check and Load model

In [24]:
config = AutoConfig.from_pretrained(model_ckpt)

In [25]:
model = AutoModelForSeq2SeqLM.from_config(config)

In [26]:
model_name = model_ckpt.split("/")[-1]
args = Seq2SeqTrainingArguments(
    f"{model_name}-finetuned-{source_lang}-to-{target_lang}-domain",
    report_to='wandb',
    run_name=run_name,

    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,

    learning_rate=learning_rate,
    weight_decay=weight_decay,

    lr_scheduler_type=lr_scheduler_type,
    warmup_ratio=warmup_ratio,

    # predict_with_generate=predict_with_generate,
    # generation_max_length=generation_max_length,

    save_total_limit=save_total_limit,

    load_best_model_at_end=load_best_model_at_end,
    metric_for_best_model=metric_for_best_model,
    
    save_strategy=save_strategy,
    evaluation_strategy=evaluation_strategy,
    # save_steps=save_steps,
    # eval_steps=eval_steps,

    logging_strategy=logging_strategy,
    logging_first_step=logging_first_step, 
    logging_steps=logging_steps,
    
    fp16=fp16,
)
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

# es = EarlyStoppingCallback(early_stopping_patience=early_stopping_patience)

In [27]:
trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    data_collator=data_collator,
    tokenizer=tokenizer,
    # callbacks=[es],
)

In [28]:
trainer.train()
wandb.finish()

The following columns in the training set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running training *****


  Num examples = 319551


  Num Epochs = 30


  Instantaneous batch size per device = 8


  Total train batch size (w. parallel, distributed & accumulation) = 16


  Gradient Accumulation steps = 2


  Total optimization steps = 599160


Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


Epoch,Training Loss,Validation Loss
1,27.3596,20.843443
2,14.478,10.387636
3,6.9032,4.867607
4,2.9585,2.56342
5,2.5035,2.189458
6,2.1531,1.863204
7,1.8614,1.581821
8,1.5875,1.298213
9,1.3083,1.080985
10,1.1329,0.934237


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-19972


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-19972/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-19972/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-19972/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-19972/special_tokens_map.json


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-39944


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-39944/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-39944/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-39944/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-39944/special_tokens_map.json


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-59916


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-59916/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-59916/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-59916/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-59916/special_tokens_map.json


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-79888


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-79888/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-79888/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-79888/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-79888/special_tokens_map.json


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-99860


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-99860/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-99860/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-99860/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-99860/special_tokens_map.json


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-119832


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-119832/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-119832/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-119832/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-119832/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-19972] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-139804


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-139804/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-139804/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-139804/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-139804/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-39944] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-159776


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-159776/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-159776/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-159776/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-159776/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-59916] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-179748


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-179748/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-179748/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-179748/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-179748/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-79888] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-199720


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-199720/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-199720/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-199720/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-199720/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-99860] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-219692


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-219692/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-219692/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-219692/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-219692/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-119832] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-239664


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-239664/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-239664/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-239664/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-239664/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-139804] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-259636


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-259636/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-259636/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-259636/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-259636/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-159776] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-279608


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-279608/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-279608/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-279608/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-279608/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-179748] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-299580


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-299580/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-299580/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-299580/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-299580/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-199720] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-319552


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-319552/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-319552/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-319552/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-319552/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-219692] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-339524


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-339524/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-339524/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-339524/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-339524/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-239664] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-359496


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-359496/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-359496/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-359496/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-359496/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-259636] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-379468


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-379468/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-379468/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-379468/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-379468/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-279608] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-399440


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-399440/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-399440/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-399440/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-399440/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-299580] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-419412


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-419412/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-419412/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-419412/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-419412/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-319552] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-439384


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-439384/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-439384/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-439384/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-439384/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-339524] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-459356


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-459356/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-459356/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-459356/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-459356/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-359496] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-479328


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-479328/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-479328/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-479328/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-479328/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-379468] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-499300


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-499300/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-499300/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-499300/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-499300/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-399440] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-519272


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-519272/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-519272/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-519272/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-519272/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-419412] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-539244


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-539244/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-539244/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-539244/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-539244/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-439384] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-559216


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-559216/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-559216/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-559216/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-559216/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-459356] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-579188


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-579188/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-579188/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-579188/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-579188/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-479328] due to args.save_total_limit


The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 8


Saving model checkpoint to mt5-small-finetuned-ko-to-en-domain/checkpoint-599160


Configuration saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-599160/config.json


Model weights saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-599160/pytorch_model.bin


tokenizer config file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-599160/tokenizer_config.json


Special tokens file saved in mt5-small-finetuned-ko-to-en-domain/checkpoint-599160/special_tokens_map.json


Deleting older checkpoint [mt5-small-finetuned-ko-to-en-domain/checkpoint-499300] due to args.save_total_limit




Training completed. Do not forget to share your model on huggingface.co/models =)




Saving model checkpoint to /tmp/tmppnnq87r_


Configuration saved in /tmp/tmppnnq87r_/config.json


Model weights saved in /tmp/tmppnnq87r_/pytorch_model.bin


tokenizer config file saved in /tmp/tmppnnq87r_/tokenizer_config.json


Special tokens file saved in /tmp/tmppnnq87r_/special_tokens_map.json


VBox(children=(Label(value='1149.260 MB of 1149.260 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.…

0,1
eval/loss,█▄▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/runtime,██▅▁▃▅▁▄▁▃▆▄█▃█▂▁▃▅▃▁▂▂▂▁▁██▅▄
eval/samples_per_second,▁▁▄█▆▄█▅█▆▃▅▁▅▁▇█▆▄▆█▆▆▇██▁▁▄▅
eval/steps_per_second,▁▁▄█▆▄█▅█▆▃▅▁▅▁▇█▆▄▆█▆▆▇██▁▁▄▅
train/epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/global_step,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/learning_rate,▂▃▅▇███████▇▇▇▇▆▆▆▆▅▅▅▄▄▄▄▃▃▃▂▂▂▂▂▁▁▁▁▁▁
train/loss,█▆▄▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.49442
eval/runtime,203.2087
eval/samples_per_second,198.609
eval/steps_per_second,24.827
train/epoch,30.0
train/global_step,599160.0
train/learning_rate,0.0
train/loss,0.5161
train/total_flos,7.48077003127081e+17
train/train_loss,3.34988


In [29]:
trainer.save_model('./save_model')

Saving model checkpoint to ./save_model


Configuration saved in ./save_model/config.json


Model weights saved in ./save_model/pytorch_model.bin


tokenizer config file saved in ./save_model/tokenizer_config.json


Special tokens file saved in ./save_model/special_tokens_map.json


In [30]:
tokenizer = AutoTokenizer.from_pretrained('./save_model')

Didn't find file ./save_model/tokenizer.json. We won't load it.


Didn't find file ./save_model/added_tokens.json. We won't load it.


loading file ./save_model/spiece.model


loading file None


loading file None


loading file ./save_model/special_tokens_map.json


loading file ./save_model/tokenizer_config.json




In [31]:
model = AutoModelForSeq2SeqLM.from_pretrained('./save_model')

loading configuration file ./save_model/config.json


Model config MT5Config {
  "_name_or_path": "./save_model",
  "architectures": [
    "MT5ForConditionalGeneration"
  ],
  "d_ff": 1024,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "mt5",
  "num_decoder_layers": 8,
  "num_heads": 6,
  "num_layers": 8,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "tokenizer_class": "T5Tokenizer",
  "torch_dtype": "float32",
  "transformers_version": "4.21.0",
  "use_cache": true,
  "vocab_size": 250112
}



loading weights file ./save_model/pytorch_model.bin


All model checkpoint weights were used when initializing MT5ForConditionalGeneration.



All the weights of MT5ForConditionalGeneration were initialized from the model checkpoint at ./save_model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MT5ForConditionalGeneration for predictions without further training.


In [32]:
args = Seq2SeqTrainingArguments(
    'eval',
    per_device_eval_batch_size=16,
    predict_with_generate=True,
    generation_max_length=256,
)
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

PyTorch: setting up devices


The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [33]:
trainer = Seq2SeqTrainer(
    model,
    args,
    # train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

In [34]:
trainer.evaluate()

The following columns in the evaluation set don't have a corresponding argument in `MT5ForConditionalGeneration.forward` and have been ignored: ko, en. If ko, en are not expected by `MT5ForConditionalGeneration.forward`,  you can safely ignore this message.


***** Running Evaluation *****


  Num examples = 40359


  Batch size = 16


Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


{'eval_loss': 0.49537065625190735,
 'eval_bleu': 67.5308,
 'eval_gen_len': 41.5994,
 'eval_runtime': 1804.8666,
 'eval_samples_per_second': 22.361,
 'eval_steps_per_second': 1.398}