Huggingface Eco System์ ์ฌ์ฉํ์๋ ๋ถ๋ค์ ์ํด, ์ฌ๋ฌ ๋ค์ด์คํธ๋ฆผ ํ์คํฌ๋ค์ ํ์ต์ํฌ ์ ์๋ ๋ชจ๋์ ๋ง๋ค์์ต๋๋ค.
Google์ seqio ์ผ๋ถ๋ฅผ huggingface datasets์ฉ์ผ๋ก ๋ง๋ ke_t5.pipe
๋ฅผ ์ด์ฉํ์ฌ ๋ช
๋ น์ด ํ์ค๋ก task๋ค์ ํ์ต์์ผ๋ณผ ์ ์์ต๋๋ค.
๋ชจ๋ ์ฌ์ฉ์ ํ์ํ ํจํค์ง๋ค์ ์ค์นํด์ค๋๋ค.
pip install -r requirements.txt
๊ธฐ๋ณธ ์ ๊ณต๋๋ ๋ชจ๋ธ๋ค์ ํฌ๊ฒ 2๊ฐ์ง๋ก ๋ถ๋ฅ๋ฉ๋๋ค.
- Seq2Seq Model (Generative)
- Encoder Model (BERT like)
Seq2Seq ๋ชจ๋ธ์ ์ ๋ ฅ๊ณผ ์ถ๋ ฅ์ด ๋ชจ๋ ํ ์คํธ์ด๋ฉฐ, Encoder ๋ชจ๋ธ์ Output์ด Class logtis(Token level, Sequence level)์ธ ๊ฒฝ์ฐ๊ฐ ๋๋ถ๋ถ์ ๋๋ค.
Task์ ์ด๋ฆ ๋ค์ _gen
์ด ๋ถ๋ ๊ฒฝ์ฐ๋ ์ด๋ฌํ seq2seq ๋ชจ๋ธ๋ค์ ํ์ต์ํฌ ์ ์๋ task๋ค์
๋๋ค.
_gen
์ด ์๋ ํ์คํฌ๋ค์ Encoder ๋ชจ๋ธ์ ํค๋๋ฅผ ๋ถ์ฌ ํ์ต์ํฌ ์ ์๋ ํ์คํฌ๋ค์
๋๋ค. (seq2seq์ผ๋ก๋ง ๊ฐ๋ฅํ task์ ๊ฒฝ์ฐ ๋ค์ด๋ฐ ๊ท์น ์์ธ๊ฐ ์กด์ฌํฉ๋๋ค)
์ง์๋๋ ๋ค์ด์คํธ๋ฆผ์ ke_t5/task/task.py๋ฅผ ์ฐธ์กฐํ์๊ธฐ ๋ฐ๋๋๋ค.
์๋๋ KE-T5๋ชจ๋ธ์ ์ด์ฉํ์ฌ nikl_summarization_summary_split
์ ํ์ต์ํค๋ ๊ฒฝ์ฐ๋ฅผ ๋ณด์ฌ์ค๋๋ค.
(summarization
task๋ generative ๋ชจ๋ธ๋ง ํ์ต๋ ๊ฐ๋ฅํ๊ธฐ ๋๋ฌธ์ _gen
์ด ๋ถ์ง ์์ต๋๋ค.)
(์ฃผ์ NIKL๊ณผ ๊ฐ์ด ์๋์ผ๋ก ๋ค์ด ๋ฐ์ง ๋ชปํ๋ ๋ฐ์ดํฐ์
์ ์ง์ ๋ฐ์ดํฐ๋ฅผ ๋ค์ด๋ฐ์ ์์ถ์ ํผ ํ ๋ฃจํธ ๋๋ ํ ๋ฆฌ์ ์์น๋ฅผ --hf_data_dir๋ก ์
๋ ฅํด์ค์ผ ํฉ๋๋ค. NIKL ๋ฐ์ดํฐ๋ฅผ ์ค๋นํ๋ ๋ฐฉ๋ฒ์ ์ฌ๊ธฐ๋ฅผ ์ฐธ๊ณ ํ์๊ธฐ ๋ฐ๋๋๋ค.)
python -m torch.distributed.launch --nproc_per_node=2 train_ddp.py \
--batch 32 \
--hf_data_dir "./data" \
--hf_cache_dir "./cache_dir/huggingface_datasets" \
--train_split "train[:90%]" \
--test_split "train[90%:]" \
--pass_only_model_io true \
--gin_param="get_dataset.sequence_length={'inputs':512, 'targets':512}" \
--gin_param="ke_t5.task.utils.get_vocabulary.vocab_name='KETI-AIR/ke-t5-base'" \
--pre_trained_model="KETI-AIR/ke-t5-base" \
--model_name "transformers:T5ForConditionalGeneration" \
--task 'nikl_summarization_summary_split'
--pass_only_model_io๋ฅผ true๋ก ์ค์ ํ๋ฉด ๋ชจ๋ธ์ IO๋ก ์ฌ์ฉ๋๋ feature๋ก๋ง mini batch๋ฅผ ๋ง๋ญ๋๋ค. ๋๋ถ๋ถ์ generative ๋ชจ๋ธ์ ๋ชจ๋ธ์ input๊ณผ taget tensor๋ง์ผ๋ก ์ฑ๋ฅ์ ์ธก์ ํ ์ ์๊ธฐ ๋๋ฌธ์ ์ด ๊ฐ์ true๋กํ๋ฉด ๋ถํ์ํ ์ฐ์ฐ์ ์ค์ผ ์ ์์ต๋๋ค. ๋ช๋ช ๋ค๋ฅธ task๋ค์ ๊ฒฝ์ฐ ๋ชจ๋ธ์ input๊ณผ target๋ง์ผ๋ก ์ฑ๋ฅ์ ์ธก์ ํ ์ ์๋ ๊ฒฝ์ฐ(NER, extractive QA, etc...)๊ฐ ์๋๋ฐ, ์ด ๊ฒฝ์ฐ์๋ ์ด ๊ฐ์ false๋ก ์ค์ ํด์ฃผ์ด์ผ ํฉ๋๋ค. ๊ธฐ๋ณธ๊ฐ์ false์ ๋๋ค.
์ ์์ ์ ๊ฐ์ด gin_param์ผ๋ก ์ฌ์ฉํ ๋ฐ์ดํฐ๋ค์ sequence length์ target length๋ฅผ ์ ๋ ฅํด์ฃผ๊ณ , ๋ฐ์ดํฐ๋ฅผ preprocessingํ๋๋ฐ ์ฌ์ฉํ huggingface tokenizer์ ์ด๋ฆ์ ์ง์ ํด์ค ์ ์์ต๋๋ค. ์ด๋ฌํ ๊ฐ๋ค์ ๋ฏธ๋ฆฌ *.ginํ์ผ์ ์ ๋ ฅํ์ฌ --gin_file๋ก ์ง์ ํ ์ ์์ต๋๋ค.
gin/train_default.gin ํ์ผ์ ๊ฒฝ์ฐ๋ฅผ ์ดํด๋ณด๋ฉด, ์๋์ ๊ฐ์ต๋๋ค.
get_dataset.sequence_length={'inputs':512, 'targets':512}
ke_t5.task.utils.get_vocabulary.vocab_name='KETI-AIR/ke-t5-base'
get_optimizer.optimizer_cls=@AdamW
AdamW.lr=1e-3
AdamW.betas=(0.9, 0.999)
AdamW.eps=1e-06
AdamW.weight_decay=1e-2
์์์ ๋ณด๋ฏ์ด sequence length์ ์ฌ์ฉํ vocabulary ๋ฟ๋ง ์๋๋ผ ์ฌ์ฉํ optimizer์ ํ๋ผ๋ฏธํฐ๊น์ง ์ง์ ๋์ด ์๋ ๊ฒ์ ํ์ธํ ์ ์์ต๋๋ค. ์ด๋ฅผ ์ด์ฉํ์ฌ ํ์ต์ ์ํค๋ ค๋ฉด ์๋์ ๊ฐ์ด ์ ๋ ฅํฉ๋๋ค.
python -m torch.distributed.launch --nproc_per_node=2 train_ddp.py \
--batch 32 \
--gin_file="gin/train_default.gin" \
--pre_trained_model="KETI-AIR/ke-t5-base" \
--model_name transformers:T5ForConditionalGeneration \
--task 'nikl_summarization_summary_split'
KE-T5 ๋ฟ๋ง ์๋๋ผ ๋ค๋ฅธ ๋ชจ๋ธ๋ค์ ์ด์ฉํด์๋ task๋ฅผ ํ์ตํ ์ ์์ต๋๋ค.
klue/roberta-small ๋ชจ๋ธ์ ์ด์ฉํ์ฌ klue topic classification์ ํ์ตํ๋ ๊ฒฝ์ฐ ์๋์ ๊ฐ์ด ์
๋ ฅํฉ๋๋ค.
์ด ๊ฒฝ์ฐ RoBERTa๋ฅผ ์ด์ฉํ sequence classification ๋ชจ๋ธ์ transformers package์ RobertaForSequenceClassification ํด๋์ค์
๋๋ค.
{๋ชจ๋ ๊ฒฝ๋ก}:{ํด๋์ค ์ด๋ฆ}
ํํ๋ก --model_name์ ์
๋ ฅํด์ค๋๋ค. ๊ฐ์ธ์ด ๋ง๋ ๋ชจ๋ธ๋ค๋ ์
๋ ฅ๊ณผ ์ถ๋ ฅ์ด huggingface ๋ชจ๋ธ๊ณผ ๋์ผํ๋ค๋ฉด ๋๊ฐ์ด ํ์ต์ํฌ ์ ์์ต๋๋ค.
(gin/klue_roberta_tc.gin์๋ vocabulary๋ฅผ klue/roberta-small์ vocabulary๋ก ์ค์ ํฉ๋๋ค. classification์ ๊ฒฝ์ฐ targets
์ seqeunce length
๋ ํฐ ์๋ฏธ๊ฐ ์์ต๋๋ค.)
python -m torch.distributed.launch --nproc_per_node=2 train_ddp.py \
--batch_size 16 \
--gin_file="gin/klue_roberta_tc.gin" \
--pre_trained_model "klue/roberta-small" \
--model_name transformers:RobertaForSequenceClassification \
--task 'klue_tc'
ํ์ต์ ๋ ์งํํ๊ณ ์ถ๋ค๋ฉด --resume์ true๋ ์ฒดํฌํฌ์ธํธ ๊ฒฝ๋ก๋ฅผ ์
๋ ฅํด์ค๋๋ค. (true์ ๊ฒฝ์ฐ ๊ธฐ๋ณธ ๊ฒฝ๋ก์์ ์ฒดํฌํฌ์ธํธ๋ฅผ ๋ก๋ํจ)
๋ชจ๋ธ์ huggingface save_pretrained
ํจ์๋ก ์ ์ฅํ์ฌ ๋์ค์ from_pretrained
๋ก ๋ก๋ฉํ๋ ค๋ฉด ์ ์ฅํ ํด๋ ๊ฒฝ๋ก๋ฅผ --hf_path์ ์
๋ ฅํด์ค๋๋ค.
python -m torch.distributed.launch --nproc_per_node=2 train_ddp.py \
--batch_size 16 \
--gin_file="gin/klue_roberta_tc.gin" \
--pre_trained_model "klue/roberta-small" \
--model_name transformers:RobertaForSequenceClassification \
--task 'klue_tc' \
--resume true \
--hf_path hf_out/klue_bert_tc
์ง๊ธ๊น์ง์ ๋ชจ๋ ์ค๋ช
์ distributed setting์ด์์๋๋ค. --nproc_per_node๊ฐ ํ์ต์ ์ฌ์ฉํ gpu์ ๊ฐฏ์๋ฅผ ๋งํด์ค๋๋ค.
Single GPU๋ก ํ์ต์ ํ๋ ค๋ฉด ์ด ๊ฐ์ 1๋ก ์ค์ ํ๊ฑฐ๋, python -m torch.distributed.launch --nproc_per_node=2
๋ถ๋ถ์ python
์ผ๋ก ๋ฐ๊ฟ์ค๋๋ค.
Test๋ฅผ ํ ๋, generative model์ ๊ฒฝ์ฐ๋ huggingface์ beam search, top_p, top_k ๋ฑ์ ์ํด generateํจ์๋ฅผ ์ฌ์ฉํ๊ณ ์ถ์ ์ ์์ต๋๋ค.
์ด ๊ฒฝ์ฐ EvaluationHelper
์ model_fn์ผ๋ก ์ฌ์ฉํ๊ณ ์ถ์ ํจ์ ์ด๋ฆ์ ์
๋ ฅํ๊ณ ,
model_kwargs๋ก ํจ์์ keyword arguments๋ฅผ ์
๋ ฅํ๋ฉด ๋ฉ๋๋ค. (์
๋ ฅํ์ง ์์์ ๊ฒฝ์ฐ task์ ์ง์ ๋ kwargs๊ฐ ์
๋ ฅ๋ฉ๋๋ค.)
model_input_keys๋ก ํจ์์ ์
๋ ฅ๋ ๋ฐ์ดํฐ์ ํ๋๋ฅผ ์ ํ ์ ์์ผ๋ฉฐ ์
๋ ฅํ์ง ์์์ ๊ฒฝ์ฐ input_ids
๋ง ์
๋ ฅ๋ฉ๋๋ค. (์ด ๊ฐ์ ๋ชจ๋ธ์ ์
๋ ฅ๋ key์ ๋ฐฐ์ด์
๋๋ค.)
gin/test_default_gen.gin
get_dataset.sequence_length={'inputs':512, 'targets':512}
ke_t5.task.utils.get_vocabulary.vocab_name='KETI-AIR/ke-t5-base'
EvaluationHelper.model_fn='generate'
EvaluationHelper.model_kwargs={
"early_stopping": True,
"length_penalty": 2.0,
"max_length": 200,
"min_length": 30,
"no_repeat_ngram_size": 3,
"num_beams": 4,
}
# EvaluationHelper.model_input_keys=['input_ids']
# Test!!!
python -m torch.distributed.launch --nproc_per_node=2 test_ddp.py \
--gin_file="gin/test_default_gen.gin" \
--model_name "transformers:T5ForConditionalGeneration" \
--task 'nikl_summarization_summary_split' \
--test_split test \
--resume true
ํ์ฌ ์ง์๋๋ ๋ค์ด์คํธ๋ฆผ๋ค์ ์ผ๋ถ์ ๋๋ค.
Task ์ด๋ฆ | ํํ |
---|---|
klue_tc_gen |
Generative |
klue_tc |
Sequence Classification - single_label_classification |
klue_nli_gen |
Generative |
klue_nli |
Sequence Classification - single_label_classification |
klue_sts_gen |
Generative |
klue_sts_re |
Sequence Classification - regression |
klue_sts |
Sequence Classification - single_label_classification |
klue_re |
Sequence Classification - single_label_classification |
klue_ner |
Token Classification |
nikl_ner |
Token Classification |
nikl_ner2020 |
Token Classification |
nikl_summarization_summary |
Generative |
nikl_summarization_topic |
Generative |
korquad_gen |
Generative |
korquad_gen_context_free |
Generative |
kor_3i4k_gen |
Sequence Classification - single_label_classification |
kor_3i4k |
Sequence Classification - single_label_classification |
ํ์ฌ ์ง์๋๋ T5๊ธฐ๋ฐ ๋ชจ๋ธ์ ๋๋ค.
๋ชจ๋ธ ์ด๋ฆ | ํํ |
---|---|
transformers:T5ForConditionalGeneration |
Generative |
T5EncoderForSequenceClassificationSimple |
Sequence Classification - single_label_classification |
T5EncoderForSequenceClassificationMean |
Sequence Classification - single_label_classification |
T5EncoderForTokenClassification |
Token Classification |
T5EncoderForEntityRecognitionWithCRF |
Token Classification |
Huggingface model์ ์์๋ฐ์ huggingface output type์ผ๋ก forward์์ returnํ๋ค๋ฉด ์ด ๋ชจ๋ธ๋ ์ฌ์ฉํ ์ ์์ต๋๋ค. ์๋ฅผ๋ค์ด my_model.py๋ฅผ ๋ค์๊ณผ ๊ฐ์ด ๋ง๋ค์๋ค๊ณ ๊ฐ์ ํฉ๋๋ค. (๋ชจ๋ธ ์์ฑ์ ke_t5/models/models.py๋ฅผ ์ฐธ์กฐํด ์ฃผ์ธ์.)
my_model_dir/my_model.py
from transformers import T5EncoderModel
from ke_t5.models.loader import register_model
@register_model("abcdefg")
class MyModel(T5EncoderModel):
...
์์ @register_model
๋ฐ์ฝ๋ ์ดํฐ๋ MyModel
ํด๋์ค๋ฅผ abcdefg๋ผ๊ณ ๋ฑ๋กํ๋ค๋ ๊ฒ์
๋๋ค.
๋ฐ๋ผ์ --model_name์ผ๋ก ๋ชจ๋ ์ด๋ฆ ์์ด abcdefg๋ฅผ ์
๋ ฅํด์ฃผ๋ฉด ๋ฉ๋๋ค.
๋ง์ฝ ์ decorator๋ฅผ ๋ถ์ด์ง ์์์ ๊ฒฝ์ฐ๋ my_model_dir.my_model:MyModel๋ก ์
๋ ฅํด์ฃผ๋ฉด ๋ฉ๋๋ค.
์๋ฅผ ๋ค์ด ๊ธฐ๋ณธ ์ ๊ณต๋๋ ๋ชจ๋ธ๋ค์ค T5EncoderForSequenceClassificationMean
ํด๋์ค๋ ke_t5.models.models์ ์์นํด ์๊ณ ,
T5EncoderForSequenceClassificationMean์ผ๋ก ์ด๋ฆ์ ๋ฑ๋กํ๊ธฐ ๋๋ฌธ์,
T5EncoderForSequenceClassificationMean ๋๋ ke_t5.models.models:T5EncoderForSequenceClassificationMean ๋ ์ค ์๋ฌด๊ฑฐ๋ --model_name์ผ๋ก ์
๋ ฅํ ์ ์์ต๋๋ค.
๋ํ Camel case๋ก ๋ช
๋ช
๋ Class์ ๊ฒฝ์ฐ ke_t5.models.models:t5_encoder_for_sequence_classification_mean์ ๊ฐ์ด ๋๋ฌธ์๋ง๋ค _
๋ฅผ ๋์ ์ด์ฉํ์
๋ ๋ฉ๋๋ค.
custom module์ train_ddp script์์ ๋์ํ๊ฒ ํ๋ ค๋ฉด --module_import์ ๋ชจ๋์ ๊ฒฝ๋ก๋ฅผ ์ ๋ ฅํด์ค๋๋ค.
python -m torch.distributed.launch --nproc_per_node=2 train_ddp.py \
--batch_size 16 \
--gin_file="my_model.gin" \
--pre_trained_model "path_to_pretrained_model_weights" \
--model_name "abcdefg" \
--task 'klue_tc' \
--module_import "my_model_dir.my_model"
์์ ์ ๋ชจ๋ธ์ ๋ง๋ huggingface vocab path๋ฅผ ์ ๋ ฅํด์ฃผ๋ ๊ฒ์ ์์ง๋ง์ธ์.
๋ช๊ฐ์ง ์ํ ๋ชจ๋ธ์ ๊ณต์ ํฉ๋๋ค.
task | model | base model | URL |
---|---|---|---|
nikl_ner |
T5EncoderForEntityRecognitionWithCRF |
KETI-AIR/ke-t5-base | Download |
nikl_ner2020 |
T5EncoderForEntityRecognitionWithCRF |
KETI-AIR/ke-t5-base | Download |
Sample code (์ํ ๋ชจ๋ธ ์ค nikl_ner ๋ชจ๋ธ๋ค์ ๋ค์ด ๋ฐ์๋ค๊ณ ๊ฐ์ )
from transformers import T5Tokenizer
from ke_t5.models import loader, models
model_path = 'path_to_model_directory'
model_name = 'T5EncoderForEntityRecognitionWithCRF'
model_cls = loader.load_model(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_path)
model = model_cls.from_pretrained(model_path)
id2label = model.config.id2label
# ์ถ์ฒ : ๊ฒฝ์์ผ๋ณด(http://www.ksilbo.co.kr)
# author: ์ด์ถ๋ด๊ธฐ์ bong@ksilbo.co.kr
# URL: http://www.ksilbo.co.kr/news/articleView.html?idxno=903455
input_txt = "์ธ์ฐ์์ค๊ณต๋จ์ ๋ค์ํ ๊ฝยท๋๋ฌด ๊ฐ์ ๊ธฐํ๋ฅผ ์ ๊ณตํด ์๋ฏผ๋ค์ \
์ฝ๋ก๋ ๋ธ๋ฃจ๋ฅผ ํด์ํ๊ณ ์ด์์ ์ธ ๊ณต๊ฐ์ ์ฐ์ถํ๊ธฐ ์ํด ์ธ์ฐ๋๊ณต์ ์ธ์ฐ๋์ข
\
๋คํธ ์ผ์ธ๊ณต์ฐ์ฅ ์๋จ์ ํด๋ฐ๋ผ๊ธฐ ์ ์์ ์กฐ์ฑํ๋ค๊ณ 13์ผ ๋ฐํ๋ค."
inputs = tokenizer(input_txt, return_tensors="pt")
output = model(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
)
input_ids = inputs.input_ids[0]
predicted_classes = output.logits[0]
inp_tks = [tokenizer.decode(x) for x in input_ids]
lbls = [id2label[x] for x in predicted_classes]
print(list(zip(inp_tks, lbls)))
# --------------------------------------------------------
## NIKL NER์ ๊ฒฝ์ฐ
[('์ธ์ฐ', 'B-OG'), ('์์ค๊ณต๋จ', 'I-OG'), ('์', 'O'), ('๋ค์ํ', 'O'),
('๊ฝ', 'B-PT'), ('ยท', 'O'), ('๋๋ฌด', 'B-PT'), ('๊ฐ์', 'O'),
('๊ธฐํ๋ฅผ', 'O'), ('์ ๊ณตํด', 'O'), ('์๋ฏผ๋ค์', 'B-CV'), ('์ฝ๋ก๋', 'O'),
('๋ธ๋ฃจ', 'O'), ('๋ฅผ', 'O'), ('ํด์ํ๊ณ ', 'O'), ('์ด์์ ์ธ', 'O'),
('๊ณต๊ฐ์', 'O'), ('์ฐ์ถ', 'O'), ('ํ๊ธฐ', 'O'), ('์ํด', 'O'),
('์ธ์ฐ', 'B-LC'), ('๋', 'I-LC'), ('๊ณต์', 'I-LC'), ('์ธ์ฐ', 'B-LC'),
('๋', 'I-LC'), ('์ข
', 'I-LC'), ('๋คํธ', 'B-TM'), ('์ผ์ธ', 'O'),
('๊ณต์ฐ์ฅ', 'O'), ('์๋จ', 'O'), ('์', 'O'), ('ํด๋ฐ๋ผ๊ธฐ', 'B-PT'),
('์ ์์', 'O'), ('์กฐ์ฑํ๋ค', 'O'), ('๊ณ ', 'O'), ('13', 'B-DT'),
('์ผ', 'I-DT'), ('๋ฐํ๋ค', 'O'), ('.', 'O'), ('</s>', 'O')]
## NIKL NER 2020์ ๊ฒฝ์ฐ
[('์ธ์ฐ', 'B-OGG_POLITICS'), ('์์ค๊ณต๋จ', 'I-OGG_POLITICS'),
('์', 'O'), ('๋ค์ํ', 'O'), ('๊ฝ', 'B-PT_PART'), ('ยท', 'O'),
('๋๋ฌด', 'O'), ('๊ฐ์', 'O'), ('๊ธฐํ๋ฅผ', 'O'), ('์ ๊ณตํด', 'O'),
('์๋ฏผ๋ค์', 'O'), ('์ฝ๋ก๋', 'O'), ('๋ธ๋ฃจ', 'O'), ('๋ฅผ', 'O'),
('ํด์ํ๊ณ ', 'O'), ('์ด์์ ์ธ', 'O'), ('๊ณต๊ฐ์', 'O'), ('์ฐ์ถ', 'O'),
('ํ๊ธฐ', 'O'), ('์ํด', 'O'), ('์ธ์ฐ', 'B-LC_OTHERS'),
('๋', 'I-LC_OTHERS'), ('๊ณต์', 'I-LC_OTHERS'),
('์ธ์ฐ', 'B-AF_CULTURAL_ASSET'), ('๋', 'I-AF_CULTURAL_ASSET'),
('์ข
', 'I-AF_CULTURAL_ASSET'), ('๋คํธ', 'O'), ('์ผ์ธ', 'O'),
('๊ณต์ฐ์ฅ', 'O'), ('์๋จ', 'O'), ('์', 'O'), ('ํด๋ฐ๋ผ๊ธฐ', 'B-PT_FLOWER'),
('์ ์์', 'O'), ('์กฐ์ฑํ๋ค', 'O'), ('๊ณ ', 'O'), ('13', 'B-DT_DAY'),
('์ผ', 'I-DT_DAY'), ('๋ฐํ๋ค', 'O'), ('.', 'O'), ('</s>', 'O')]
# --------------------------------------------------------
Task๋ณ ๋ชจ๋ธ ํ์ต์ ์ฌ์ฉ๋ configuration๋ค ์ ๋๋ค.
train_RE.gin
get_dataset.sequence_length={'inputs':512, 'targets':512}
ke_t5.task.utils.get_vocabulary.vocab_name='KETI-AIR/ke-t5-base'
get_optimizer.optimizer_cls=@AdamW
AdamW.lr=1e-5
AdamW.betas=(0.9, 0.999)
AdamW.eps=1e-06
AdamW.weight_decay=1e-2
gin/RE_test_default.gin
get_dataset.sequence_length={'inputs':512, 'targets':8}
ke_t5.task.utils.get_vocabulary.vocab_name='KETI-AIR/ke-t5-base'
EvaluationHelper.model_fn='forward'
EvaluationHelper.model_input_keys=['input_ids', 'attention_mask', 'entity_token_idx']
Command
CUDA_VISIBLE_DEVICES='0' python train_ddp.py --batch_size 32 --gin_file="gin/train_RE.gin" --pre_trained_model "KETI-AIR/ke-t5-base" --model_name T5EncoderForSequenceClassificationMeanSubmeanObjmean --task 'klue_re_tk_idx' -epochs 50 --train_split train --valid_split test
EPOCHS=50
BSZ=16
NUM_PROC=8
WORKERS=0
PRE_TRAINED_MODEL="KETI-AIR/ke-t5-base"
TASK=klue_re_tk_idx
MODEL=T5EncoderForSequenceClassificationFirstSubmeanObjmean
# training
python -m torch.distributed.launch --nproc_per_node=${NUM_PROC} \
train_ddp.py \
--gin_file="tmp/train_RE.gin" \
--model_name ${MODEL} --task ${TASK} \
--train_split "train" --valid_split "test" \
--epochs ${EPOCHS} \
--batch_size ${BSZ} \
--workers ${WORKERS} \
--pre_trained_model ${PRE_TRAINED_MODEL}
# test
python -m torch.distributed.launch --nproc_per_node=${NUM_PROC} \
test_ddp.py \
--gin_file="tmp/test_RE.gin" \
--model_name ${MODEL} \
--task ${TASK} \
--batch_size ${BSZ} \
--pre_trained_model ${PRE_TRAINED_MODEL} \
--resume output/${MODEL}_KETI-AIR_ke-t5-base/${TASK}/weights/best_model.pth
Performance
task | model | base model | *F1mic | URL |
---|---|---|---|---|
KLUE RE | T5EncoderForSequenceClassificationFirstSubmeanObjmean | KETI-AIR/ke-t5-base | 73.64 | Download |
* The F1-Scoremic of KLUE-RE is micro-averaged F1 score ignoring the no_relation.
gin/train_default.gin
get_dataset.sequence_length={'inputs':512, 'targets':512}
ke_t5.task.utils.get_vocabulary.vocab_name='KETI-AIR/ke-t5-base'
get_optimizer.optimizer_cls=@AdamW
AdamW.lr=3e-4
AdamW.betas=(0.9, 0.999)
AdamW.eps=1e-06
AdamW.weight_decay=1e-2
gin/test_default.gin
get_dataset.sequence_length={'inputs':512, 'targets':512}
ke_t5.task.utils.get_vocabulary.vocab_name='KETI-AIR/ke-t5-base'
EvaluationHelper.model_fn='forward'
EvaluationHelper.model_input_keys=['input_ids', 'attention_mask']
Command
EPOCHS=3
BSZ=24
NUM_PROC=8
WORKERS=0
PRE_TRAINED_MODEL="KETI-AIR/ke-t5-base"
TASK=klue_tc
MODEL=T5EncoderForSequenceClassificationMean
# training
python -m torch.distributed.launch --nproc_per_node=${NUM_PROC} \
train_ddp.py \
--gin_file="tmp/train_default.gin" \
--model_name ${MODEL} --task ${TASK} \
--train_split "train" --valid_split "test" \
--epochs ${EPOCHS} \
--batch_size ${BSZ} \
--workers ${WORKERS} \
--pre_trained_model ${PRE_TRAINED_MODEL}
# test
python -m torch.distributed.launch --nproc_per_node=${NUM_PROC} \
test_ddp.py \
--gin_file="tmp/test_default.gin" \
--model_name ${MODEL} \
--task ${TASK} \
--batch_size ${BSZ} \
--pre_trained_model ${PRE_TRAINED_MODEL} \
--resume output/${MODEL}_KETI-AIR_ke-t5-base/${TASK}/weights/best_model.pth
Performance
task | model | base model | Acc. |
---|---|---|---|
KLUE TC | T5EncoderForSequenceClassificationMean | ke-t5-base | 85.579 |
gin file์ topic classification๊ณผ ๋์ผํฉ๋๋ค.
Command
EPOCHS=15
BSZ=24
NUM_PROC=8
WORKERS=0
PRE_TRAINED_MODEL="KETI-AIR/ke-t5-base"
TASK=klue_nli
MODEL=T5EncoderForSequenceClassificationMean
# training
python -m torch.distributed.launch --nproc_per_node=${NUM_PROC} \
train_ddp.py \
--gin_file="tmp/train_default.gin" \
--model_name ${MODEL} --task ${TASK} \
--train_split "train" --valid_split "test" \
--epochs ${EPOCHS} \
--batch_size ${BSZ} \
--workers ${WORKERS} \
--pre_trained_model ${PRE_TRAINED_MODEL}
# test
python -m torch.distributed.launch --nproc_per_node=${NUM_PROC} \
test_ddp.py \
--gin_file="tmp/test_default.gin" \
--model_name ${MODEL} \
--task ${TASK} \
--batch_size ${BSZ} \
--pre_trained_model ${PRE_TRAINED_MODEL} \
--resume best
# save the model as Hugging face style.
python -m torch.distributed.launch --nproc_per_node=${NUM_PROC} \
train_ddp.py \
--gin_file="tmp/train_default.gin" \
--model_name ${MODEL} --task ${TASK} \
--train_split "train" --valid_split "test" \
--epochs ${EPOCHS} \
--batch_size ${BSZ} \
--workers ${WORKERS} \
--resume true \
--hf_path default \
--pre_trained_model ${PRE_TRAINED_MODEL}
Performance
task | model | base model | Acc. |
---|---|---|---|
KLUE NLI | T5EncoderForSequenceClassificationMean | ke-t5-base | 85 |
TODO Seq pipe์ ๋ํ์ฌ ์ค๋ช ํ ๊ฒ.
- Seq Pipe ์ค๋ช ์ถ๊ฐ
- Generative model์ ์ํ Mixture task ์ถ๊ฐ
- Coreference Resolution ์ฝ๋ ์ถ๊ฐ