<a href="https://colab.research.google.com/github/ttogle918/NLU_3-/blob/main/%EC%B5%9C%EC%A7%80%ED%98%84_sts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **NLU - 문장 유사도 계산 (STS)**



- 과제 목표
  - 두 개의 한국어 문장을 입력받아 두 문장의 의미적 유사도를 출력
  - regression task ( 0 <= target <= 5 ) **=> klue 결과값이 0~5이다! logit을 정규화할 필요!**
    -  as a real value from 0 (no meaning overlap) to 5 (meaning equivalence)
    - [klue](https://klue-benchmark.com/tasks/67/overview/description)
- 학습 데이터 셋 ( 다운로드 가능 & 제공 예정 )
  - KLUE-STS
    - AIRBNB ( 리뷰 )
    - policy ( 뉴스 )
    - paraKOQC ( 스마트홈 쿼리 )
- 과제 결과물
  - 학습된 모델 ( 모델 자유 선택 ) ( train set만 사용해 학습 )
  - 학습 방식 보고서
    - 어떤 모델을 선택했나
    - 어떻게 파라미터를 튜닝했나
    - 어떤 훈련 과정을 거쳤는가
  - dev set score ( F1 )
  - 문장 유사도를 출력하는 API ( 프레임워크 자유 선택 )

- [graykode/ALBERT-Pytorch](https://github.com/graykode/ALBERT-Pytorch)
- [huggingface](https://huggingface.co/docs/transformers/model_doc/albert)
- [korsts](https://github.com/kakaobrain/KorNLUDatasets)


유사도 계산.. 순서는 상관없는 것 같다..

albert가 sop(문장 순서 예측)을 통해 모델을 훈련하기 때문에 사용하려고 했는데, 유사도만 계산하는 것이기 때문에 albert를 사용할 때 장점이 크지는 않을 것 같다.

해야할 부분 

1. 모델 바꿔서 해보기!

In [None]:
!pip install optuna
!pip install pytorch-transformers
!pip install transformers
!pip install datasets

In [1]:
import torch
from torch import nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import AdamW
from torch.nn.utils import clip_grad_norm_
from torch.utils.data import Dataset, DataLoader
import numpy as np
from tqdm import tqdm, tqdm_notebook
from sklearn.metrics import f1_score
from scipy import stats
import time
import matplotlib.pyplot as plt

In [2]:
# gpu 연산이 가능하면 'cuda:0', 아니면 'cpu' 출력
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device, torch.cuda.device_count()

(device(type='cuda', index=0), 1)

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
from transformers import BertForNextSentencePrediction, AutoTokenizer, BertConfig
from transformers.optimization import get_cosine_schedule_with_warmup
from transformers import AdamW
from transformers import get_linear_schedule_with_warmup

- "klue/roberta-large"
- "klue/roberta-small"
- "klue/roberta-base"
- "klue/bert-base"
- [klue에 등록된 모델](https://huggingface.co/klue)
- [한국어언어모델](https://littlefoxdiary.tistory.com/81)

# KLUE 데이터셋

[klue-sts-벤치마크-구조-보기](https://velog.io/@soyoun9798/KLUE-STS-%EB%B2%A4%EC%B9%98%EB%A7%88%ED%81%AC-%EA%B5%AC%EC%A1%B0-%EB%B3%B4%EA%B8%B0)


In [5]:
from datasets import load_dataset
dataset = load_dataset('klue', 'sts')

Reusing dataset klue (/root/.cache/huggingface/datasets/klue/sts/1.0.0/e0fc3bc3de3eb03be2c92d72fd04a60ecc71903f821619cb28ca0e1e29e4233e)


  0%|          | 0/2 [00:00<?, ?it/s]

In [6]:
print(f"type(dataset) : {type(dataset)}")
print(f"key : {dataset.keys()}")
print(f"type dataset[train] : {type(dataset['train'])}")
print(f"dataset[train] : {dataset['train']} \n\n")
# labels : { 이진분류 : 1, 반올림 값 : 3.7, 실제 label 값 : 3.71422... }
dataset['train'][0]

type(dataset) : <class 'datasets.dataset_dict.DatasetDict'>
key : dict_keys(['train', 'validation'])
type dataset[train] : <class 'datasets.arrow_dataset.Dataset'>
dataset[train] : Dataset({
    features: ['guid', 'source', 'sentence1', 'sentence2', 'labels'],
    num_rows: 11668
}) 




{'guid': 'klue-sts-v1_train_00000',
 'labels': {'binary-label': 1, 'label': 3.7, 'real-label': 3.714285714285714},
 'sentence1': '숙소 위치는 찾기 쉽고 일반적인 한국의 반지하 숙소입니다.',
 'sentence2': '숙박시설의 위치는 쉽게 찾을 수 있고 한국의 대표적인 반지하 숙박시설입니다.',
 'source': 'airbnb-rtt'}

In [7]:
# 데이터 10개만 확인
i = 0
for d in dataset['train'] :
  if i == 10 : break
  print(d)
  i += 1

{'guid': 'klue-sts-v1_train_00000', 'source': 'airbnb-rtt', 'sentence1': '숙소 위치는 찾기 쉽고 일반적인 한국의 반지하 숙소입니다.', 'sentence2': '숙박시설의 위치는 쉽게 찾을 수 있고 한국의 대표적인 반지하 숙박시설입니다.', 'labels': {'label': 3.7, 'real-label': 3.714285714285714, 'binary-label': 1}}
{'guid': 'klue-sts-v1_train_00001', 'source': 'policy-sampled', 'sentence1': '위반행위 조사 등을 거부·방해·기피한 자는 500만원 이하 과태료 부과 대상이다.', 'sentence2': '시민들 스스로 자발적인 예방 노력을\xa0한 것은 아산 뿐만이 아니었다.', 'labels': {'label': 0.0, 'real-label': 0.0, 'binary-label': 0}}
{'guid': 'klue-sts-v1_train_00002', 'source': 'paraKQC-sampled', 'sentence1': '회사가 보낸 메일은 이 지메일이 아니라 다른 지메일 계정으로 전달해줘.', 'sentence2': '사람들이 주로 네이버 메일을 쓰는 이유를 알려줘', 'labels': {'label': 0.3, 'real-label': 0.3333333333333333, 'binary-label': 0}}
{'guid': 'klue-sts-v1_train_00003', 'source': 'policy-sampled', 'sentence1': '긴급 고용안정지원금은 지역고용대응 등 특별지원금, 지자체별 소상공인 지원사업, 취업성공패키지, 청년구직활동지원금, 긴급복지지원제도 지원금과는 중복 수급이 불가능하다.', 'sentence2': '고용보험이 1차 고용안전망이라면, 국민취업지원제도는 2차 고용안전망입니다.', 'labels': {'label': 0.6, 'real-la

## dataset -> DataFrame

In [8]:
import pandas as pd

In [9]:
df_dev = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/nlp/sts-dev.tsv', sep='\t', on_bad_lines='skip')
df_train = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/nlp/sts-train.tsv', sep='\t', on_bad_lines='skip')
df_test = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/nlp/sts-test.tsv', sep='\t', on_bad_lines='skip')

In [10]:
print(f'shape : train {df_train.shape}, dev {df_dev.shape}, test {df_test.shape}')
df_train.head()

shape : train (5696, 7), dev (1466, 7), test (1379, 7)


Unnamed: 0,genre,filename,year,id,score,sentence1,sentence2
0,main-captions,MSRvid,2012test,1,5.0,비행기가 이륙하고 있다.,비행기가 이륙하고 있다.
1,main-captions,MSRvid,2012test,4,3.8,한 남자가 큰 플루트를 연주하고 있다.,남자가 플루트를 연주하고 있다.
2,main-captions,MSRvid,2012test,5,3.8,한 남자가 피자에 치즈를 뿌려놓고 있다.,한 남자가 구운 피자에 치즈 조각을 뿌려놓고 있다.
3,main-captions,MSRvid,2012test,6,2.6,세 남자가 체스를 하고 있다.,두 남자가 체스를 하고 있다.
4,main-captions,MSRvid,2012test,9,4.25,한 남자가 첼로를 연주하고 있다.,자리에 앉은 남자가 첼로를 연주하고 있다.


In [11]:
df_train = df_train.drop(labels=['genre', 'filename', 'year', 'id'], axis=1)
df_dev = df_dev.drop(labels=['genre', 'filename', 'year', 'id'], axis=1)
df_test = df_test.drop(labels=['genre', 'filename', 'year', 'id'], axis=1)
df_test = pd.concat([df_test, df_dev])
print(f'shape : train {df_train.shape}, dev {df_dev.shape}, test {df_test.shape}')
df_test.head()

shape : train (5696, 3), dev (1466, 3), test (2845, 3)


Unnamed: 0,score,sentence1,sentence2
0,2.5,한 소녀가 머리를 스타일링하고 있다.,한 소녀가 머리를 빗고 있다.
1,3.6,한 무리의 남자들이 해변에서 축구를 한다.,한 무리의 소년들이 해변에서 축구를 하고 있다.
2,5.0,한 여성이 다른 여성의 발목을 재고 있다.,한 여자는 다른 여자의 발목을 측정한다.
3,4.2,한 남자가 오이를 자르고 있다.,한 남자가 오이를 자르고 있다.
4,1.5,한 남자가 하프를 연주하고 있다.,한 남자가 키보드를 연주하고 있다.


In [12]:
df_train=df_train.dropna()
df_test=df_test.dropna()
df_train.shape, df_test.shape

((5691, 3), (2841, 3))

In [13]:
df_train.score.min(), df_train.score.max()

(0.0, 5.0)

In [14]:
sentence1, sentence2, labels = [], [], []

for data in dataset['train'] :
  sentence1.append(data['sentence1'])
  sentence2.append(data['sentence2'])
  labels.append(data['labels']['real-label'])
  # labels.append(data['labels'])
df = pd.DataFrame({'sentence1' : sentence1, 'sentence2' : sentence2, 'labels' : labels})
df.head(3)

Unnamed: 0,sentence1,sentence2,labels
0,숙소 위치는 찾기 쉽고 일반적인 한국의 반지하 숙소입니다.,숙박시설의 위치는 쉽게 찾을 수 있고 한국의 대표적인 반지하 숙박시설입니다.,3.714286
1,위반행위 조사 등을 거부·방해·기피한 자는 500만원 이하 과태료 부과 대상이다.,시민들 스스로 자발적인 예방 노력을 한 것은 아산 뿐만이 아니었다.,0.0
2,회사가 보낸 메일은 이 지메일이 아니라 다른 지메일 계정으로 전달해줘.,사람들이 주로 네이버 메일을 쓰는 이유를 알려줘,0.333333


In [15]:
del df
del sentence1
del sentence2
del labels

# Dataset Tokenizing -> dataLoader

In [16]:
import re
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler

In [17]:
class CustomDataset(Dataset):
    def __init__(self, dataset, appended_data):
        self.sentence1, self.sentence2, self.labels = self.make_dataset(dataset, appended_data)
        # self.make_dataset(dataset, appended_data)

    def make_dataset(self, dataset, appended_data):
        """
        self.label : dataset의 label의 list
        self.input : sentence1, sentence2를 tokenizer한 값을 이어 붙임 
        rlabels : # real-label
        """
        sentence1, sentence2, rlabels = [], [], []

        for data in dataset :
          rlabels.append(data['labels']['real-label'])
          sentence1.append(self.cleaning(data['sentence1']))
          sentence2.append(self.cleaning(data['sentence2']))

        if appended_data is not None :        
          rlabels.extend(appended_data['score'].to_list())
          sentence1.extend(appended_data['sentence1'].to_list())
          sentence2.extend(appended_data['sentence2'].to_list())
        return sentence1, sentence2, rlabels

        # self.tensorized_input, self.tensorized_label = custom_collate_fn((sentence1, sentence2, rlabels))
        
    def __len__(self):
        # return len(self.tensorized_input)
        return len(self.labels)

    def __getitem__(self, idx):
        return self.sentence1[idx], self.sentence2[idx], self.labels[idx]
        # return self.tensorized_input[idx], self.tensorized_label[idx]

    def cleaning(self, sentence) :
        return re.sub('a-zA-Z一-龥㐀-䶵豈-龎[-=+#/\:^$@*\"※~&%ㆍ』\\‘〈〉|\(\)\[\]\<\>`\'…》《]','', sentence)

In [18]:
def custom_collate_fn(batch):
    input1_list, input2_list, target_list = [], [], []

    for _input1, _input2, _target in batch:
        input1_list.append(_input1)
        input2_list.append(_input2)
        target_list.append(_target)
    
    tensorized_input = tokenizer(
        input1_list, input2_list,
        add_special_tokens=True,
        padding="longest",  # 배치내 가장 긴 문장을 기준으로 부족한 문장은 [PAD] 토큰을 추가
        truncation=True, # max_length를 넘는 문장은 이 후 토큰을 제거함
        max_length=512,
        return_tensors='pt' # 토크나이즈된 결과 값을 텐서 형태로 반환
    )
    tensorized_label = torch.tensor(target_list)

    return tensorized_input, tensorized_label

In [19]:
# 1. 32, 32
def make_dataloader(dataset, tok_model, batch_size, s='train') :
  global tokenizer
  tokenizer = AutoTokenizer.from_pretrained(tok_model)
  if s == 'train' :
    dataloader = DataLoader(
        dataset,
        batch_size =batch_size,
        sampler = RandomSampler(dataset),
        collate_fn = custom_collate_fn
    )
  else :
    dataloader = DataLoader(
        dataset,
        batch_size =batch_size,
        sampler = SequentialSampler(dataset),
        collate_fn = custom_collate_fn
    )
  return dataloader

# model class

In [20]:
# 모델 클래스
class CustomSTS(nn.Module):
    def __init__(self, hidden_size: int, model_name):
        super(CustomSTS, self).__init__()
        self.bert_config = BertConfig.from_pretrained(model_name)   
        self.model = BertForNextSentencePrediction.from_pretrained(model_name, config=self.bert_config)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, input_ids=None, attention_mask=None, token_type_ids=None, labels=None):
        """
        outputs(NextSentencePredictorOutput) : logtis, loss(next_sentence_label이 주어질 때 return)
                                              hidden_states(optional), attentions(optional) 을 가지고 있다.
        loss는 주어진 label이 0~5 사이의 값으로 scale 되어있기 때문에 직접 구해야한다!
        """
        # logits's shape : (batch_size, 2)
        logits = self.model(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
        ).logits
        probs = self.softmax(logits)
        probs = probs[:, 0] * 5    # 0~5 사이의 값으로 정답(T)일 확률 뽑아내기
        return probs    # 정답(T)일 확률, 정답일때 1 

# train

### model, optimizer, scheduler 초기화

In [21]:
def initializer(train_dataloader, epochs=2, model_name='snunlp/KR-Medium', lr=4e-5, wd=4e-5):
    """
    모델, 옵티마이저, 스케쥴러 초기화
    """
    model = CustomSTS(hidden_size=768, model_name=model_name)   # hidden size?

    optimizer = AdamW(
        model.parameters(), # update 대상 파라미터를 입력
        lr=lr,
        eps=1e-8,
        weight_decay=wd
    )
    
    total_steps = len(train_dataloader) * epochs
    print(f"Total train steps with {epochs} epochs: {total_steps}")

    scheduler = get_linear_schedule_with_warmup(
        optimizer, 
        num_warmup_steps = 0, # 여기서는 warmup을 사용하지 않는다.
        num_training_steps = total_steps
    )
    return model, optimizer, scheduler

### checkpoint

In [22]:
def save_checkpoint(path, model, optimizer, scheduler, epoch, loss, f1, pearson, model_name=''):
    file_name = f'{path}/model_name_epoch:{epoch}_loss:{loss:.4f}_f1:{f1:.4f}_pearson:{pearson:.4f}.ckpt'
    
    torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'scheduler_state_dict': scheduler.state_dict(),
            'loss' : loss,
            'f1' : f1
        }, 
        file_name
    )
    
    print(f"Saving epoch {epoch} checkpoint at {file_name}")

### train code

In [23]:
def train(model, optimizer, scheduler, train_dataloader, valid_dataloader=None, epochs=1, model_name=''):
        loss_fct = nn.MSELoss()
        train_dict = {'loss' : [], 'f1' : []}
        valid_dict = {'loss' : [], 'f1' : [], 'pearson' : []}
        global before_loss, before_f1, before_pearson
        before_loss, before_f1, before_pearson = 0.4, 0.8, 0.8
        for epoch in range(epochs) :

            print(f"*****Epoch {epoch} Train Start*****")
            # 배치 단위 평균 loss와 총 평균 loss 계산하기위해 변수 생성
            total_loss, total_f1, batch_f1, batch_loss, batch_count = 0,0,0,0,0
            
            # model을 train 모드로 설정 & device 할당
            model.train()
            model.to(device)
            
            # data iterator를 돌면서 하나씩 학습
            for step, batch in enumerate(train_dataloader):
                batch_count+=1
                
                # tensor 연산 전, 각 tensor에 device 할당
                batch = tuple(item.to(device) for item in batch)
                
                batch_input, batch_label = batch
                
                # batch마다 모델이 갖고 있는 기존 gradient를 초기화/??
                model.zero_grad()
                
                # forward
                probs = model(**batch_input)

                # loss
                loss = loss_fct(probs, batch_label)
                batch_loss += loss.item()
                total_loss += loss.item()

                # pearsonr 상관계수
                # pearson = torch.corrcoef(torch.stack([probs, batch_label], dim=0))   #stats.pearsonr(probs, batch_label)[0]
                
                # f1-score
                f1 = f1_score([0 if p < 3 else 1 for p in batch_label], [0 if p < 3 else 1 for p in probs])
                batch_f1 += f1
                total_f1 += f1

                # backward -> 파라미터의 미분(gradient)를 자동으로 계산
                loss.backward()

                # gradient clipping 적용 
                clip_grad_norm_(model.parameters(), 1.0)
                
                # optimizer & scheduler 업데이트
                optimizer.step()
                scheduler.step()
        
                # 그래디언트 초기화
                model.zero_grad()
        
                # 배치 64개씩 처리할 때마다 평균 loss와 lr를 출력
                if (step % 128 == 0 and step != 0):
                    learning_rate = optimizer.param_groups[0]['lr']
                    print(f"Epoch: {epoch}, Step : {step}, LR : {learning_rate:.10f}, Avg Loss : {batch_loss / batch_count:.4f}, f1 score : {batch_f1 / batch_count:.4f}")

                    # 변수 초기화
                    batch_loss, batch_f1, batch_count = 0,0,0


            print(f"Epoch {epoch} Total Mean Loss : {total_loss/(step+1):.4f}")
            print(f"Epoch {epoch} Total Mean f1 : {total_f1/(step+1):.4f}")
            print(f"*****Epoch {epoch} Train Finish*****\n")

            train_dict['f1'].append(total_f1/(step+1))
            train_dict['loss'].append(total_loss/(step+1))
            
            if valid_dataloader is not None:
                print(f"*****Epoch {epoch} Valid Start*****")
                valid_loss, valid_acc, valid_f1, valid_pearson = validate(model, valid_dataloader)
                print(f"Epoch {epoch} Valid Loss : {valid_loss:.4f} Valid Acc : {valid_acc:.4f} Valid f1 : {valid_f1:.4f}")
                print(f"pearson 상관 계수 ; {valid_pearson}")
                print(f"*****Epoch {epoch} Valid Finish*****\n")

            valid_dict['f1'].append(valid_f1)
            valid_dict['loss'].append(valid_loss)
            valid_dict['pearson'].append(valid_pearson) #.sum()

            if before_loss > valid_loss :
                before_loss = valid_loss
                save_checkpoint("/content/drive/MyDrive/Colab Notebooks/nlp", model, optimizer, scheduler, epoch, valid_loss, valid_f1, valid_pearson, model_name)

            elif before_f1 < valid_f1  :
                before_f1 = valid_f1
                save_checkpoint("/content/drive/MyDrive/Colab Notebooks/nlp", model, optimizer, scheduler, epoch, valid_loss, valid_f1, valid_pearson, model_name)
            
            elif before_pearson < valid_pearson  :
                before_pearson = valid_pearson
                save_checkpoint("/content/drive/MyDrive/Colab Notebooks/nlp", model, optimizer, scheduler, epoch, valid_loss, valid_f1, valid_pearson, model_name)

        print("Train Finished")
        return train_dict, valid_dict

### validation code

In [24]:
def validate(model, valid_dataloader):
    loss_fct = nn.MSELoss()
    # 모델을 evaluate 모드로 설정 & device 할당
    model.eval()
    model.to(device)
    
    total_loss, total_acc, total_f1, total_pearson= 0,0, 0, 0
        
    for step, batch in enumerate(valid_dataloader):
        
        # tensor 연산 전, 각 tensor에 device 할당
        batch = tuple(item.to(device) for item in batch)
            
        batch_input, batch_label = batch
            
        # gradient 계산하지 않음
        with torch.no_grad():
            probs = model(**batch_input)
            
        # loss
        loss = loss_fct(probs, batch_label)
        total_loss += loss.item()
        
        # accuracy
        acc = 0
        for p, b in zip(probs, batch_label) :
          if (p > 3 and b > 3) or (p < 3 and b < 3 ) :
            acc += 1
        
        acc = acc / len(probs)
        total_acc+=acc
        
        # pearsonr 상관계수
        pearson = torch.corrcoef(torch.stack([probs, batch_label], dim=0))
        total_pearson += pearson

        # f1-score
        f1 = f1_score([0 if p < 3 else 1 for p in batch_label], [0 if p < 3 else 1 for p in probs])
        total_f1 += f1

    total_loss = total_loss/(step+1)
    total_acc = total_acc/(step+1)*100
    total_f1 = total_f1/(step+1)
    total_pearson = total_pearson/(step+1)
    return total_loss, total_acc, total_f1, total_pearson

# hyperparameter 조정

[optuna 사용법](https://dacon.io/codeshare/2704)

- transformer의 trainer 사용
  - [trainer huggingface](https://huggingface.co/docs/transformers/main_classes/trainer)
  - [bert에 optuna 사용법-medium](https://medium.com/carbon-consulting/transformer-models-hyperparameter-optimization-with-the-optuna-299e185044a8)
  - [bert에 optuna 사용법-git블로그](https://thigm85.github.io/blog/search/cord19/bert/transformers/optuna/2020/11/07/bert-training-optuna-tuning.html)


In [25]:
import optuna 

In [26]:
# train_batch_size, valid_batch_size = 64, 32
# train_dataloader = make_dataloader(dataset['train'], 'klue/bert-base', train_batch_size, df_train, 'train')
# valid_dataloader = make_dataloader(dataset['validation'], 'klue/bert-base', valid_batch_size, df_test, 'valid')

# epochs=20
# model, optimizer, scheduler = initializer(train_dataloader, epochs, 'klue/bert-base')   # 여기에 config 선언되어있다.

# start = time.time()
# train(model, optimizer, scheduler, train_dataloader, valid_dataloader, epochs)
# end = time.time()
# print(f"time : {(end - start)//60}분 {(end - start)%60}초")

In [27]:
# del train_dataloader
# del valid_dataloader
# del model
# del optimizer
# del scheduler

In [28]:
def draw_plot(train_dict, valid_dict, s) :
        #   train_dict = {'loss' : [], 'f1' : []}
        # valid_dict = {'loss' : [], 'f1' : [], 'pearson' : []}
  print(s)
  plt.subplot(1, 2, 1)
  plt.xlabel('Epochs')
  plt.title('Loss and F1 of Train data(green=loss, gray=f1')
  y_values= [n for n in range(len(train_dict['loss']))]
  plt.plot(train_dict['loss'], y_values, color='green', marker='o')  # loss
  plt.plot(train_dict['f1'], y_values, color='#AAAAAA', marker='*')  # f1

  plt.subplot(1, 2, 2)
  plt.xlabel('Epochs')
  plt.title('Loss and F1 of Validation data(green=loss, gray=f1, stealblue=pearson')
  y_values= [n for n in range(len(valid_dict['loss']))]
  plt.plot(valid_dict['loss'], y_values, color='green', marker='o')  # loss
  plt.plot(valid_dict['f1'], y_values, color='#AAAAAA', marker='*')  # f1
  plt.plot(valid_dict['pearson'], y_values, color='stealblue', marker='s')  # pearson

  plt.show()
  # plt.savefig(f'{s}.png')

In [29]:
def objective(trial: optuna.Trial):
    train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128])
    model_name = 'klue/bert-base'# trial.suggest_categorical("model_name", ['klue/bert-base', "snunlp/KR-Medium"])    # 2가지만 비교!
    
    train_dataloader = make_dataloader(train_dataset, model_name, train_batch_size, 'train')
    valid_dataloader = make_dataloader(valid_dataset, model_name, 32, 'valid')

    learning_rate = trial.suggest_loguniform('learning_rate', low=1e-6, high=0.01)
    weight_decay = trial.suggest_loguniform('weight_decay', 1e-5, 0.01)

    # num_train_epochs = trial.suggest_int('num_train_epochs', low = 1,high= 20)
    model, optimizer, scheduler = initializer(train_dataloader, 1, model_name, learning_rate, weight_decay)   # 여기에 config 선언되어있다.

    start = time.time()
    train_dict, valid_dict = train(model, optimizer, scheduler, train_dataloader, valid_dataloader, 10, model_name)
    end = time.time()
    print(f"time : {(end - start)//60}분 {(end - start)%60}초")

    draw_plot(train_dict, valid_dict, model_name)


In [30]:
train_dataset = CustomDataset(dataset['train'], df_train)
valid_dataset = CustomDataset(dataset['validation'], df_test)

In [31]:
del df_train
del df_test
del df_dev

In [32]:
import gc
gc.collect()

1584

In [33]:
# We want to minimize the loss! 
study = optuna.create_study(study_name='hyper-parameter-search') 
# Optimize the objective using 5 different trials 
study.optimize(objective, n_trials=10)
# Gives the best loss value 
print(study.best_value) 
# Gives the best hyperparameter values to get the best loss value 
print(study.best_params) 
# Return info about best Trial such as start and end datetime, hyperparameters  
print(study.best_trial)

[32m[I 2022-05-30 05:58:04,288][0m A new study created in memory with name: hyper-parameter-search[0m
Some weights of the model checkpoint at klue/bert-base were not used when initializing BertForNextSentencePrediction: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total train steps with 1 epochs: 543
*****Epoch 0 Train Start*****
Epoch: 0, Step : 128, LR : 0.0000276071, Avg Loss : 0.9077, f1 score : 0.8391


[33m[W 2022-05-30 05:59:47,012][0m Trial 0 failed because of the following error: RuntimeError('CUDA out of memory. Tried to allocate 306.00 MiB (GPU 0; 14.76 GiB total capacity; 12.84 GiB already allocated; 93.75 MiB free; 13.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF')[0m
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "<ipython-input-29-3f67a126ce77>", line 15, in objective
    train_dict, valid_dict = train(model, optimizer, scheduler, train_dataloader, valid_dataloader, 10, model_name)
  File "<ipython-input-23-e0a4455c4325>", line 30, in train
    probs = model(**batch_input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_ca

RuntimeError: ignored

### Train : 'snunlp/KR-Medium'

batch_size : {train : 32, valid : 32}

In [None]:
train_batch_size, valid_batch_size = 32, 32
train_dataloader, valid_dataloader = make_dataloader(dataset, "snunlp/KR-Medium", train_batch_size, valid_batch_size)

In [None]:
epochs=20
model, optimizer, scheduler = initializer(train_dataloader, epochs, "snunlp/KR-Medium")

t0 = time.time()
train(model, train_dataloader, valid_dataloader, epochs)
print(time.time()-t0)

Some weights of the model checkpoint at snunlp/KR-Medium were not used when initializing BertForNextSentencePrediction: ['cls.predictions.decoder.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total train steps with 20 epochs: 7300
*****Epoch 0 Train Start*****
Epoch: 0, Step : 64, LR : 0.0000, Avg Loss : 1.6681, f1 score : 0.9286
Epoch: 0, Step : 128, LR : 0.0000, Avg Loss : 0.3266, f1 score : 1.0000
Epoch: 0, Step : 192, LR : 0.0000, Avg Loss : 0.2903, f1 score : 0.8000
Epoch: 0, Step : 256, LR : 0.0000, Avg Loss : 0.2625, f1 score : 0.9167
Epoch: 0, Step : 320, LR : 0.0000, Avg Loss : 0.2438, f1 score : 1.0000
Epoch 0 Total Mean Loss : 0.5240
*****Epoch 0 Train Finish*****

*****Epoch 0 Valid Start*****
Epoch 0 Valid Loss : 0.8607 Valid Acc : 70.54 Valid f1 : 0.7144
*****Epoch 0 Valid Finish*****

Saving epoch 0 checkpoint at /content/drive/MyDrive/Colab Notebooks/nlp/model0_loss:0.5240_f1:0.7144.ckpt
*****Epoch 1 Train Start*****
Epoch: 1, Step : 64, LR : 0.0000, Avg Loss : 0.1896, f1 score : 1.0000
Epoch: 1, Step : 128, LR : 0.0000, Avg Loss : 0.1573, f1 score : 0.9333
Epoch: 1, Step : 192, LR : 0.0000, Avg Loss : 0.1561, f1 score : 0.9677
Epoch: 1, Step : 256, LR : 0.0

In [None]:
if valid_dataloader is not None:
    valid_loss, valid_acc, valid_f1, valid_pearson = validate(model, valid_dataloader)
    print(f"Epoch 1 Valid Loss : {valid_loss:.4f} Valid Acc : {valid_acc:.2f} Valid f1 : {valid_f1:.4f}")
    print(f"pearson 상관 계수 ; {valid_pearson}")
    print(f"*****Epoch 1 Valid Finish*****\n")

Epoch 1 Valid Loss : 0.5634 Valid Acc : 79.02 Valid f1 : 0.7773
pearson 상관 계수 ; tensor([[1.0000, 0.8687],
        [0.8687, 1.0000]], device='cuda:0')
*****Epoch 1 Valid Finish*****



### Train : 'snunlp/KR-Medium'
batch_size : {train : 64, valid : 64}

In [None]:
train_batch_size, valid_batch_size = 64, 64
train_dataloader, valid_dataloader = make_dataloader(dataset, "snunlp/KR-Medium", train_batch_size, valid_batch_size)
epochs=20
model, optimizer, scheduler = initializer(train_dataloader, epochs, "snunlp/KR-Medium")

t0 = time.time()
train(model, train_dataloader, valid_dataloader, epochs)
print(time.time()-t0)

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/337 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/140k [00:00<?, ?B/s]

<class 'torch.utils.data.dataloader.DataLoader'>


Downloading:   0%|          | 0.00/389M [00:00<?, ?B/s]

Some weights of the model checkpoint at snunlp/KR-Medium were not used when initializing BertForNextSentencePrediction: ['cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total train steps with 20 epochs: 3660
*****Epoch 0 Train Start*****
Epoch: 0, Step : 64, LR : 0.0000196448, Avg Loss : 1.2513, f1 score : 0.8761
Epoch: 0, Step : 128, LR : 0.0000192951, Avg Loss : 0.3020, f1 score : 1.8197
Epoch 0 Total Mean Loss : 0.6297
*****Epoch 0 Train Finish*****

*****Epoch 0 Valid Start*****
Epoch 0 Valid Loss : 0.8089 Valid Acc : 71.2798 Valid f1 : 0.6892
pearson 상관 계수 ; tensor([[1.0000, 0.7987],
        [0.7987, 1.0000]], device='cuda:0')
*****Epoch 0 Valid Finish*****

Saving epoch 0 checkpoint at /content/drive/MyDrive/Colab Notebooks/nlp/model0_loss:0.8089_f1:0.6892.ckpt
*****Epoch 1 Train Start*****
Epoch: 1, Step : 64, LR : 0.0000186448, Avg Loss : 0.1875, f1 score : 0.9514
Epoch: 1, Step : 128, LR : 0.0000182951, Avg Loss : 0.1827, f1 score : 1.9202
Epoch 1 Total Mean Loss : 0.1818
*****Epoch 1 Train Finish*****

*****Epoch 1 Valid Start*****
Epoch 1 Valid Loss : 0.7495 Valid Acc : 72.1478 Valid f1 : 0.7061
pearson 상관 계수 ; tensor([[1.0000, 0.8226],
   

### Train : 'monologg/kobert' => X
[git](https://github.com/monologg/KoBERT-Transformers)

- sub task에 sts가 없음
- NSMC, NER, KorQuAD


In [None]:
train_dataloader, valid_dataloader = make_dataloader(dataset, 'monologg/kobert')
epochs=20
model, optimizer, scheduler = initializer(train_dataloader, epochs, 'monologg/kobert')   # 여기에 config 선언되어있다.

start = time.time()
train(model, train_dataloader, valid_dataloader, epochs)
end = time.time()
print(f"time : {(end - start)//60}분 {(end - start)%60}초")

Downloading:   0%|          | 0.00/51.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/76.0k [00:00<?, ?B/s]

<class 'torch.utils.data.dataloader.DataLoader'>


Downloading:   0%|          | 0.00/352M [00:00<?, ?B/s]

Some weights of BertForNextSentencePrediction were not initialized from the model checkpoint at monologg/kobert and are newly initialized: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Total train steps with 20 epochs: 7300
*****Epoch 0 Train Start*****
Epoch: 0, Step : 64, LR : 0.0000198219, Avg Loss : 2.6811, f1 score : 0.3158
Epoch: 0, Step : 128, LR : 0.0000196466, Avg Loss : 2.3828, f1 score : 0.5517
Epoch: 0, Step : 192, LR : 0.0000194712, Avg Loss : 2.3224, f1 score : 0.2857
Epoch: 0, Step : 256, LR : 0.0000192959, Avg Loss : 2.1728, f1 score : 0.3636
Epoch: 0, Step : 320, LR : 0.0000191205, Avg Loss : 2.0097, f1 score : 0.6667
Epoch 0 Total Mean Loss : 2.2716
*****Epoch 0 Train Finish*****

*****Epoch 0 Valid Start*****
Epoch 0 Valid Loss : 2.7981 Valid Acc : 56.01 Valid f1 : 0.4805
pearson 상관 계수 ; tensor([[1.0000, 0.1509],
        [0.1509, 1.0000]], device='cuda:0')
*****Epoch 0 Valid Finish*****

*****Epoch 1 Train Start*****
Epoch: 1, Step : 64, LR : 0.0000188219, Avg Loss : 1.8499, f1 score : 0.6250
Epoch: 1, Step : 128, LR : 0.0000186466, Avg Loss : 1.8260, f1 score : 0.4615
Epoch: 1, Step : 192, LR : 0.0000184712, Avg Loss : 1.8183, f1 score : 0.8800
Ep

### train : 'klue/bert-base'
batch_size : {train : 32, valid : 32}
- Epoch 15 : 가장 작은 valid_loss를 가지고 있다.
  - Valid Loss : 0.3798
  - Valid Acc : 81.75
  - Valid f1 : 0.8095
  - pearson 상관 계수 : [[1.0000, 0.9116],[0.9116, 1.0000]]


In [None]:
train_batch_size, valid_batch_size = 64, 64
train_dataloader = make_dataloader(dataset['train'], 'klue/bert-base', train_batch_size, df_train, 'train')
valid_dataloader = make_dataloader(dataset['valid'], 'klue/bert-base', valid_batch_size, df_dev, 'valid')

epochs=20
model, optimizer, scheduler = initializer(train_dataloader, epochs, 'klue/bert-base')   # 여기에 config 선언되어있다.

start = time.time()
train(model, train_dataloader, valid_dataloader, epochs)
end = time.time()
print(f"time : {(end - start)//60}분 {(end - start)%60}초")

Downloading:   0%|          | 0.00/289 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/425 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/243k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/125 [00:00<?, ?B/s]

<class 'torch.utils.data.dataloader.DataLoader'>


Downloading:   0%|          | 0.00/424M [00:00<?, ?B/s]

Some weights of the model checkpoint at klue/bert-base were not used when initializing BertForNextSentencePrediction: ['cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total train steps with 20 epochs: 7300
*****Epoch 0 Train Start*****
Epoch: 0, Step : 64, LR : 0.0000198219, Avg Loss : 1.3634, f1 score : 0.7857
Epoch: 0, Step : 128, LR : 0.0000196466, Avg Loss : 0.2565, f1 score : 0.9714
Epoch: 0, Step : 192, LR : 0.0000194712, Avg Loss : 0.2213, f1 score : 0.9677
Epoch: 0, Step : 256, LR : 0.0000192959, Avg Loss : 0.2065, f1 score : 0.9565
Epoch: 0, Step : 320, LR : 0.0000191205, Avg Loss : 0.1954, f1 score : 0.8966
Epoch 0 Total Mean Loss : 0.4213
*****Epoch 0 Train Finish*****

*****Epoch 0 Valid Start*****
Epoch 0 Valid Loss : 0.6604 Valid Acc : 76.97 Valid f1 : 0.7891
pearson 상관 계수 ; tensor([[1.0000, 0.8633],
        [0.8633, 1.0000]], device='cuda:0')
*****Epoch 0 Valid Finish*****

Saving epoch 0 checkpoint at /content/drive/MyDrive/Colab Notebooks/nlp/model0_loss:0.6604_f1:0.7891.ckpt
*****Epoch 1 Train Start*****
Epoch: 1, Step : 64, LR : 0.0000188219, Avg Loss : 0.1142, f1 score : 0.9600
Epoch: 1, Step : 128, LR : 0.0000186466, Avg Loss : 

### Train : 'klue/bert-base'
train_batch_size, valid_batch_size = 64, 64

In [None]:
train_batch_size, valid_batch_size = 64, 64
train_dataloader, valid_dataloader = make_dataloader(dataset, 'klue/bert-base', train_batch_size, valid_batch_size)
epochs=20
model, optimizer, scheduler = initializer(train_dataloader, epochs, 'klue/bert-base')

start = time.time()
train(model, train_dataloader, valid_dataloader, epochs)
end = time.time()
print(f"time : {(end - start)//60}분 {(end - start)%60}초")

Downloading:   0%|          | 0.00/289 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/425 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/243k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/125 [00:00<?, ?B/s]

<class 'torch.utils.data.dataloader.DataLoader'>


Downloading:   0%|          | 0.00/424M [00:00<?, ?B/s]

Some weights of the model checkpoint at klue/bert-base were not used when initializing BertForNextSentencePrediction: ['cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total train steps with 20 epochs: 3660
*****Epoch 0 Train Start*****
Epoch: 0, Step : 64, LR : 0.0000196448, Avg Loss : 1.0730, f1 score : 0.8849
Epoch: 0, Step : 128, LR : 0.0000192951, Avg Loss : 0.2309, f1 score : 0.9411
Epoch 0 Total Mean Loss : 0.5189
*****Epoch 0 Train Finish*****

*****Epoch 0 Valid Start*****
Epoch 0 Valid Loss : 0.6289 Valid Acc : 73.7103 Valid f1 : 0.7504
pearson 상관 계수 ; tensor([[1.0000, 0.8662],
        [0.8662, 1.0000]], device='cuda:0')
*****Epoch 0 Valid Finish*****

Saving epoch 0 checkpoint at /content/drive/MyDrive/Colab Notebooks/nlp/model0_loss:0.6289_f1:0.7504.ckpt
*****Epoch 1 Train Start*****
Epoch: 1, Step : 64, LR : 0.0000186448, Avg Loss : 0.1369, f1 score : 0.9609
Epoch: 1, Step : 128, LR : 0.0000182951, Avg Loss : 0.1365, f1 score : 0.9631
Epoch 1 Total Mean Loss : 0.1387
*****Epoch 1 Train Finish*****

*****Epoch 1 Valid Start*****
Epoch 1 Valid Loss : 0.6064 Valid Acc : 79.0923 Valid f1 : 0.8043
pearson 상관 계수 ; tensor([[1.0000, 0.8772],
   

### Train : 'klue/bert-base' 
특수문자, 영어 제거

train_batch_size, valid_batch_size = 64, 64

In [None]:
train_batch_size, valid_batch_size = 64, 64
train_dataloader, valid_dataloader = make_dataloader(dataset, 'klue/bert-base', train_batch_size, valid_batch_size)
epochs=20
model, optimizer, scheduler = initializer(train_dataloader, epochs, 'klue/bert-base')

start = time.time()
train(model, train_dataloader, valid_dataloader, epochs)
end = time.time()
print(f"time : {(end - start)//60}분 {(end - start)%60}초")

<class 'torch.utils.data.dataloader.DataLoader'>


Some weights of the model checkpoint at klue/bert-base were not used when initializing BertForNextSentencePrediction: ['cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total train steps with 20 epochs: 3660
*****Epoch 0 Train Start*****
Epoch: 0, Step : 64, LR : 0.0000196448, Avg Loss : 1.1754, f1 score : 0.8771
Epoch: 0, Step : 128, LR : 0.0000192951, Avg Loss : 0.2371, f1 score : 0.9376
Epoch 0 Total Mean Loss : 0.5586
*****Epoch 0 Train Finish*****

*****Epoch 0 Valid Start*****
Epoch 0 Valid Loss : 0.5560 Valid Acc : 78.3978 Valid f1 : 0.7875
pearson 상관 계수 ; tensor([[1.0000, 0.8722],
        [0.8722, 1.0000]], device='cuda:0')
*****Epoch 0 Valid Finish*****

Saving epoch 0 checkpoint at /content/drive/MyDrive/Colab Notebooks/nlp/model0_loss:0.5560_f1:0.7875.ckpt
*****Epoch 1 Train Start*****
Epoch: 1, Step : 64, LR : 0.0000186448, Avg Loss : 0.1402, f1 score : 0.9563
Epoch: 1, Step : 128, LR : 0.0000182951, Avg Loss : 0.1387, f1 score : 0.9657
Epoch 1 Total Mean Loss : 0.1360
*****Epoch 1 Train Finish*****

*****Epoch 1 Valid Start*****
Epoch 1 Valid Loss : 0.6015 Valid Acc : 77.7034 Valid f1 : 0.7883
pearson 상관 계수 ; tensor([[1.0000, 0.8804],
   

### Train : 'klue/bert-base'
특수문자, 영어 제거

train_batch_size, valid_batch_size = 128, 64

In [None]:
train_batch_size, valid_batch_size = 128, 64
train_dataloader, valid_dataloader = make_dataloader(dataset, 'klue/bert-base', train_batch_size, valid_batch_size)
epochs=20
model, optimizer, scheduler = initializer(train_dataloader, epochs, 'klue/bert-base')

start = time.time()
train(model, train_dataloader, valid_dataloader, epochs)
end = time.time()
print(f"time : {(end - start)//60}분 {(end - start)%60}초")

<class 'torch.utils.data.dataloader.DataLoader'>


Some weights of the model checkpoint at klue/bert-base were not used when initializing BertForNextSentencePrediction: ['cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total train steps with 20 epochs: 1840
*****Epoch 0 Train Start*****
Epoch: 0, Step : 64, LR : 0.0000192935, Avg Loss : 1.0742, f1 score : 0.8753
Epoch 0 Total Mean Loss : 0.8245
*****Epoch 0 Train Finish*****

*****Epoch 0 Valid Start*****
Epoch 0 Valid Loss : 0.5846 Valid Acc : 77.7034 Valid f1 : 0.7792
pearson 상관 계수 ; tensor([[1.0000, 0.8588],
        [0.8588, 1.0000]], device='cuda:0')
*****Epoch 0 Valid Finish*****

Saving epoch 0 checkpoint at /content/drive/MyDrive/Colab Notebooks/nlp/model0_loss:0.5846_f1:0.7792.ckpt
*****Epoch 1 Train Start*****
Epoch: 1, Step : 64, LR : 0.0000182935, Avg Loss : 0.1609, f1 score : 0.9570
Epoch 1 Total Mean Loss : 0.1580
*****Epoch 1 Train Finish*****

*****Epoch 1 Valid Start*****
Epoch 1 Valid Loss : 0.6139 Valid Acc : 75.9425 Valid f1 : 0.7597
pearson 상관 계수 ; tensor([[1.0000, 0.8664],
        [0.8664, 1.0000]], device='cuda:0')
*****Epoch 1 Valid Finish*****

*****Epoch 2 Train Start*****
Epoch: 2, Step : 64, LR : 0.0000172935, Avg Loss : 0.

## Train : 'klue/bert-base'
특수문자, 영어 제거

train_batch_size, valid_batch_size = 32, 32

In [None]:
train_batch_size, valid_batch_size = 32, 32
train_dataloader, valid_dataloader = make_dataloader(dataset, 'klue/bert-base', train_batch_size, valid_batch_size)
epochs=20
model, optimizer, scheduler = initializer(train_dataloader, epochs, 'klue/bert-base')

start = time.time()
train(model, train_dataloader, valid_dataloader, epochs)
end = time.time()
print(f"time : {(end - start)//60}분 {(end - start)%60}초")

<class 'torch.utils.data.dataloader.DataLoader'>


Some weights of the model checkpoint at klue/bert-base were not used when initializing BertForNextSentencePrediction: ['cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total train steps with 20 epochs: 7300
*****Epoch 0 Train Start*****
Epoch: 0, Step : 64, LR : 0.0000198219, Avg Loss : 1.4178, f1 score : 0.8444
Epoch: 0, Step : 128, LR : 0.0000196466, Avg Loss : 0.2520, f1 score : 0.9341
Epoch: 0, Step : 192, LR : 0.0000194712, Avg Loss : 0.2168, f1 score : 0.9472
Epoch: 0, Step : 256, LR : 0.0000192959, Avg Loss : 0.2055, f1 score : 0.9543
Epoch: 0, Step : 320, LR : 0.0000191205, Avg Loss : 0.2022, f1 score : 0.9418
Epoch 0 Total Mean Loss : 0.4295
*****Epoch 0 Train Finish*****

*****Epoch 0 Valid Start*****
Epoch 0 Valid Loss : 0.6574 Valid Acc : 76.9695 Valid f1 : 0.7838
pearson 상관 계수 ; tensor([[1.0000, 0.8604],
        [0.8604, 1.0000]], device='cuda:0')
*****Epoch 0 Valid Finish*****

Saving epoch 0 checkpoint at /content/drive/MyDrive/Colab Notebooks/nlp/model0_loss:0.6574_f1:0.7838.ckpt
*****Epoch 1 Train Start*****
Epoch: 1, Step : 64, LR : 0.0000188219, Avg Loss : 0.1261, f1 score : 0.9679
Epoch: 1, Step : 128, LR : 0.0000186466, Avg Loss 

## Train : 'klue/bert-base'
특수문자, 영어 제거

train_batch_size, valid_batch_size = 16, 16

In [None]:
train_batch_size, valid_batch_size = 16, 16
train_dataloader, valid_dataloader = make_dataloader(dataset, 'klue/bert-base', train_batch_size, valid_batch_size)
epochs=20
model, optimizer, scheduler = initializer(train_dataloader, epochs, 'klue/bert-base')

start = time.time()
train_dict_batch16, valid_dict_batch16 = train(model, train_dataloader, valid_dataloader, epochs)
end = time.time()
print(f"time : {(end - start)//60}분 {(end - start)%60}초")

<class 'torch.utils.data.dataloader.DataLoader'>


Some weights of the model checkpoint at klue/bert-base were not used when initializing BertForNextSentencePrediction: ['cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total train steps with 20 epochs: 14600
*****Epoch 0 Train Start*****
Epoch: 0, Step : 64, LR : 0.0000199110, Avg Loss : 1.6044, f1 score : 0.8097
Epoch: 0, Step : 128, LR : 0.0000198233, Avg Loss : 0.3461, f1 score : 1.7349
Epoch: 0, Step : 192, LR : 0.0000197356, Avg Loss : 0.2768, f1 score : 2.6833
Epoch: 0, Step : 256, LR : 0.0000196479, Avg Loss : 0.2293, f1 score : 3.6206
Epoch: 0, Step : 320, LR : 0.0000195603, Avg Loss : 0.2229, f1 score : 4.5600
Epoch: 0, Step : 384, LR : 0.0000194726, Avg Loss : 0.1985, f1 score : 5.4987
Epoch: 0, Step : 448, LR : 0.0000193849, Avg Loss : 0.2023, f1 score : 6.4344
Epoch: 0, Step : 512, LR : 0.0000192973, Avg Loss : 0.1822, f1 score : 7.3787
Epoch: 0, Step : 576, LR : 0.0000192096, Avg Loss : 0.1933, f1 score : 8.3375
Epoch: 0, Step : 640, LR : 0.0000191219, Avg Loss : 0.1976, f1 score : 9.2881
Epoch: 0, Step : 704, LR : 0.0000190342, Avg Loss : 0.1784, f1 score : 10.2434
Epoch 0 Total Mean Loss : 0.3442
Epoch 0 Total Mean f1 : 0.9310
*****Epo

  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))


Epoch: 16, Step : 320, LR : 0.0000035603, Avg Loss : 0.0078, f1 score : 4.9345
Epoch: 16, Step : 384, LR : 0.0000034726, Avg Loss : 0.0068, f1 score : 5.9159
Epoch: 16, Step : 448, LR : 0.0000033849, Avg Loss : 0.0073, f1 score : 6.8954
Epoch: 16, Step : 512, LR : 0.0000032973, Avg Loss : 0.0068, f1 score : 7.8827
Epoch: 16, Step : 576, LR : 0.0000032096, Avg Loss : 0.0077, f1 score : 8.8733
Epoch: 16, Step : 640, LR : 0.0000031219, Avg Loss : 0.0075, f1 score : 9.8623
Epoch: 16, Step : 704, LR : 0.0000030342, Avg Loss : 0.0065, f1 score : 10.8552
Epoch 16 Total Mean Loss : 0.0072
Epoch 16 Total Mean f1 : 0.9854
*****Epoch 16 Train Finish*****

*****Epoch 16 Valid Start*****
Epoch 16 Valid Loss : 0.3944 Valid Acc : 83.4957 Valid f1 : 0.8242
pearson 상관 계수 ; tensor([[1.0000, 0.9119],
        [0.9119, 1.0000]], device='cuda:0')
*****Epoch 16 Valid Finish*****

*****Epoch 17 Train Start*****
Epoch: 17, Step : 64, LR : 0.0000029110, Avg Loss : 0.0066, f1 score : 0.9886
Epoch: 17, Step : 128