## Python check

In [1]:
import sys

python = sys.executable
print(f"* Python path: {python}")
print(f"* Library version:")
!echo '  -' `$python -m pip list | grep -w numpy`
!echo '  -' `$python -m pip list | grep -w datasets`
!echo '  -' `$python -m pip list | grep -w torch`
!echo '  -' `$python -m pip list | grep -w LTNtorch`
!echo '  -' `$python -m pip list | grep -w transformers`

* Python path: /home/chris/miniconda3/envs/LTN/bin/python
* Library version:
  - numpy 1.21.5
  - datasets 1.17.0
  - torch 1.10.1
  - LTNtorch 0.9
  - transformers 4.15.0


## Data check

In [2]:
from main import data_files
from pathlib import Path

data_dir = Path(data_files['train']).parent

print(f"* data_files:")
for k, v in data_files.items():
    print(f"  - {k}: {v}")
print('-' * 112)
!ls -l $data_dir

* data_files:
  - train: data/klue-sts-cls/train.json
  - valid: data/klue-sts-cls/valid.json
----------------------------------------------------------------------------------------------------------------
total 3764
-rw-rw-r-- 1 chris chris 3690197 Dec 30 16:55 train.json
-rw-rw-r-- 1 chris chris  160929 Dec 30 16:55 valid.json


## Env setting

In [3]:
from main import gpu_ids, lang_models, do_experiment

gpu_id = 2
print(f"* available gpu_ids: {gpu_ids}")
print(f"* gpu_id: {gpu_id}")

lm_id = 2
print(f"* available lang_models: ")
for m in lang_models:
    print(f"  - {m}")
print(f"* lang_model: {lang_models[lm_id]}")

* available gpu_ids: (0, 1, 2, 3)
* gpu_id: 2
* available lang_models: 
  - bert-base-multilingual-uncased
  - skt/kobert-base-v1
  - monologg/koelectra-base-v3-discriminator
  - monologg/kobigbird-bert-base
* lang_model: monologg/koelectra-base-v3-discriminator


## KLUE-STS(cls) [KoELECTRA]

In [4]:
do_experiment(gpu_id=gpu_id, lang_model=lang_models[lm_id], learning_rate=2e-5, max_seq_length=512, max_epoch=10)


[device] cuda:2 ∈ [cuda:0, cuda:1, cuda:2, cuda:3]



Using custom data configuration default-b3f46ff75b254c98
Reusing dataset json (/home/chris/.cache/huggingface/datasets/json/default-b3f46ff75b254c98/0.0.0/c90812beea906fcffe0d5e3bb9eba909a80a998b5f88e9f8acbd320aa91acfde)


  0%|          | 0/2 [00:00<?, ?it/s]


[raw_datasets] DatasetDict({
    train: Dataset({
        features: ['guid', 'sentence1', 'sentence2', 'label'],
        num_rows: 11668
    })
    valid: Dataset({
        features: ['guid', 'sentence1', 'sentence2', 'label'],
        num_rows: 519
    })
})
- input_columns: sentence1, sentence2


[tokenizer(ElectraTokenizerFast)] PreTrainedTokenizerFast(name_or_path='monologg/koelectra-base-v3-discriminator', vocab_size=35000, model_max_len=512, is_fast=True, padding_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'})
- text   = [CLS] 한국어 사전학습 모델을 공유합니다. [SEP]
- tokens = ['[CLS]', '한국어', '사전', '##학습', '모델', '##을', '공유', '##합니다', '.', '[SEP]']
- ids    = [2, 11229, 7485, 26694, 6918, 4292, 7824, 17788, 18, 3]



Running tokenizer on dataset:   0%|          | 0/6 [00:00<?, ?ba/s]


- [tokens](512)	= ['[CLS]', '숙소', '위치', '##는', '찾기', '쉽', '##고', '일반', '##적', '##인', '한국', '##의', '반지', '##하', '숙소', '##입니다', '.', '[SEP]', '숙박', '##시설', '##의', '위치', '##는', '쉽', '##게', '...', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]']



Running tokenizer on dataset:   0%|          | 0/1 [00:00<?, ?ba/s]

Some weights of the model checkpoint at monologg/koelectra-base-v3-discriminator were not used when initializing ElectraModel: ['discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.weight']
- This IS expected if you are initializing ElectraModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).



[pretrained] ElectraModel(
  (embeddings): ElectraEmbeddings(
    (word_embeddings): Embedding(35000, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  ...
)
-      input_ids(2x512) : [2, 11229, 7485, 26694, 6918, 4292, 7824, 17788, 18, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, '...', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- attention_mask(2x512) : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, '...', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- token_type_ids(2x512) : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, '...', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-  output_hidden(2x512x768) : tensor([[ 0.0427, -0.5928, -0.3165,  ..., -0.179

0

In [5]:
do_experiment(gpu_id=gpu_id, lang_model=lang_models[lm_id], learning_rate=1e-5, max_seq_length=512, max_epoch=10)



[device] cuda:2 ∈ [cuda:0, cuda:1, cuda:2, cuda:3]



Using custom data configuration default-b3f46ff75b254c98
Reusing dataset json (/home/chris/.cache/huggingface/datasets/json/default-b3f46ff75b254c98/0.0.0/c90812beea906fcffe0d5e3bb9eba909a80a998b5f88e9f8acbd320aa91acfde)


  0%|          | 0/2 [00:00<?, ?it/s]


[raw_datasets] DatasetDict({
    train: Dataset({
        features: ['guid', 'sentence1', 'sentence2', 'label'],
        num_rows: 11668
    })
    valid: Dataset({
        features: ['guid', 'sentence1', 'sentence2', 'label'],
        num_rows: 519
    })
})
- input_columns: sentence1, sentence2


[tokenizer(ElectraTokenizerFast)] PreTrainedTokenizerFast(name_or_path='monologg/koelectra-base-v3-discriminator', vocab_size=35000, model_max_len=512, is_fast=True, padding_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'})
- text   = [CLS] 한국어 사전학습 모델을 공유합니다. [SEP]
- tokens = ['[CLS]', '한국어', '사전', '##학습', '모델', '##을', '공유', '##합니다', '.', '[SEP]']
- ids    = [2, 11229, 7485, 26694, 6918, 4292, 7824, 17788, 18, 3]



Running tokenizer on dataset:   0%|          | 0/6 [00:00<?, ?ba/s]


- [tokens](512)	= ['[CLS]', '숙소', '위치', '##는', '찾기', '쉽', '##고', '일반', '##적', '##인', '한국', '##의', '반지', '##하', '숙소', '##입니다', '.', '[SEP]', '숙박', '##시설', '##의', '위치', '##는', '쉽', '##게', '...', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]']



Running tokenizer on dataset:   0%|          | 0/1 [00:00<?, ?ba/s]

Some weights of the model checkpoint at monologg/koelectra-base-v3-discriminator were not used when initializing ElectraModel: ['discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.weight']
- This IS expected if you are initializing ElectraModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).



[pretrained] ElectraModel(
  (embeddings): ElectraEmbeddings(
    (word_embeddings): Embedding(35000, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  ...
)
-      input_ids(2x512) : [2, 11229, 7485, 26694, 6918, 4292, 7824, 17788, 18, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, '...', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- attention_mask(2x512) : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, '...', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- token_type_ids(2x512) : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, '...', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-  output_hidden(2x512x768) : tensor([[ 0.0427, -0.5928, -0.3165,  ..., -0.179

0