<img src = 'https://github.com/rungjoo/CoMPM/raw/master/image/model.png'>

2가지 모듈 제시

- CoM(Context Module)
  - fine-tuning을 진행하는 일반적인 Pre-Trained Model
  - 입력으로는 발화 전부가 들어감
    - [발화1] [감정]
    - [발화1] [발화2] [감정]
    - [발화1] [발화2] [발화3] [감정]
  - 각 발화는 CLS Vector를 통해서 추출

- PM(Pretrained Memory Module)
  - 동일한 Pre-trained 모델을 의미하나 Context indepedent 발화의 Feature를 담아냄
  - LM을 Knowledge로 사용
  - 해당 input data 중 이전 발화 data를 GPU를 통해 하나의 Vector로 변환


---
# **라이브러리 설치**
---

In [1]:
!pip install transformers==4.4.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers==4.4.0
  Downloading transformers-4.4.0-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 4.2 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
[K     |████████████████████████████████| 880 kB 101.5 MB/s 
[?25hCollecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 64.3 MB/s 
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
  Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895260 sha256=b2210839e467900d96534e0b72914adb683120608e6962c34a1bcbb77065242a
  Stored in directory: /root/.cache/pip/wheels/82/ab/9b/c15899bf659ba74f623ac776e861cf2eb8608c1825ddec66

In [2]:
import torch
import transformers
transformers.__version__
torch.__version__

'1.12.1+cu113'

In [3]:
!nvidia-smi # GPU할당량 확인 (15109MiB 할당)

Wed Dec  7 11:32:21 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   60C    P0    29W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

---
# **데이터 다운로드**
---

In [4]:
!git clone https://github.com/declare-lab/MELD.git

Cloning into 'MELD'...
remote: Enumerating objects: 487, done.[K
remote: Counting objects: 100% (12/12), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 487 (delta 6), reused 0 (delta 0), pack-reused 475[K
Receiving objects: 100% (487/487), 8.12 MiB | 15.80 MiB/s, done.
Resolving deltas: 100% (254/254), done.


In [5]:
import glob # 경로의 리스트들 도출
data_path = "./MELD/data/MELD/*.csv"
data_path_list = glob.glob(data_path)
print(data_path_list)

['./MELD/data/MELD/test_sent_emo.csv', './MELD/data/MELD/dev_sent_emo.csv', './MELD/data/MELD/train_sent_emo.csv']


---
# 데이터 확인
---

In [6]:
from torch.utils.data import DataLoader
import csv
from torch.utils.data import Dataset
import pandas as pd
from transformers import RobertaTokenizer
from transformers import RobertaModel
from transformers import get_linear_schedule_with_warmup
import torch
import torch.nn as nn
from transformers import get_linear_schedule_with_warmup
from tqdm.notebook import tqdm
import os
from sklearn.metrics import precision_recall_fscore_support
import pdb

import logging

In [7]:
!head -5 './MELD/data/MELD/dev_sent_emo.csv'

Sr No.,Utterance,Speaker,Emotion,Sentiment,Dialogue_ID,Utterance_ID,Season,Episode,StartTime,EndTime
1,"Oh my God, he’s lost it. He’s totally lost it.",Phoebe,sadness,negative,0,0,4,7,"00:20:57,256","00:21:00,049"
2,What?,Monica,surprise,negative,0,1,4,7,"00:21:01,927","00:21:03,261"
3,"Or! Or, we could go to the bank, close our accounts and cut them off at the source.",Ross,neutral,neutral,1,0,4,4,"00:12:24,660","00:12:30,915"
4,You’re a genius!,Chandler,joy,positive,1,1,4,4,"00:12:32,334","00:12:33,960"


In [8]:
# 데이터 출력
for data_path in data_path_list:
    f = open(data_path, 'r')
    rdr = csv.reader(f)
    
    for line in rdr:
        print(line)
        break
        
    f.close()
    break

['Sr No.', 'Utterance', 'Speaker', 'Emotion', 'Sentiment', 'Dialogue_ID', 'Utterance_ID', 'Season', 'Episode', 'StartTime', 'EndTime']


---
# 세션으로 데이터 분할하기
----

In [9]:
data = pd.read_csv('./MELD/data/MELD/dev_sent_emo.csv')

In [10]:
def split(session):
    final_data = []
    split_session = []
    for line in session:
        split_session.append(line)
        final_data.append(split_session[:])    
    return final_data

dataset = []
speaker_set = []
count = 0
for i in range(max(data['Dialogue_ID'])+1):
  new_data = data[data['Dialogue_ID'] == i]
  new_data = new_data[['Utterance','Speaker','Emotion','Dialogue_ID']]
  new_data = new_data.reset_index()
  value = []
  for j in range(len(new_data)):
    if new_data['Speaker'][j] in speaker_set:
      uniq_speaker = speaker_set.index(new_data['Speaker'][j]) # speaker의 index값 추출 
    else:
      speaker_set.append(new_data['Speaker'][j])  # 없다면 새로운 speaker 추가
      uniq_speaker = speaker_set.index(new_data['Speaker'][j]) # speaker의 index값 추출 
      
    value.append([uniq_speaker, new_data['Utterance'][j], new_data['Emotion'][j]])
  dataset += split(value)

print(dataset)

[[[0, 'Oh my God, he’s lost it. He’s totally lost it.', 'sadness']], [[0, 'Oh my God, he’s lost it. He’s totally lost it.', 'sadness'], [1, 'What?', 'surprise']], [[2, 'Or! Or, we could go to the bank, close our accounts and cut them off at the source.', 'neutral']], [[2, 'Or! Or, we could go to the bank, close our accounts and cut them off at the source.', 'neutral'], [3, 'You’re a genius!', 'joy']], [[2, 'Or! Or, we could go to the bank, close our accounts and cut them off at the source.', 'neutral'], [3, 'You’re a genius!', 'joy'], [4, 'Aww, man, now we won’t be bank buddies!', 'sadness']], [[2, 'Or! Or, we could go to the bank, close our accounts and cut them off at the source.', 'neutral'], [3, 'You’re a genius!', 'joy'], [4, 'Aww, man, now we won’t be bank buddies!', 'sadness'], [3, 'Now, there’s two reasons.', 'neutral']], [[2, 'Or! Or, we could go to the bank, close our accounts and cut them off at the source.', 'neutral'], [3, 'You’re a genius!', 'joy'], [4, 'Aww, man, now we 

---
# 사전 학습 모델들에 대한 백그라운드
---


In [11]:
""" 토크나이저 확인하기 """
# https://github.com/thunlp/PLMpapers
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [12]:
""" 토크나이저 작동 """
res = tokenizer('hello. this is fastcampus')
print(res)
res = tokenizer.encode('hello. this is fastcampus')
print(res)
res = tokenizer(['hello. this is fastcampus', "what are you doing?"])
print(res)
res = tokenizer(['hello. this is fastcampus', "what are you doing?"], add_special_tokens=False)
print(res)

{'input_ids': [0, 42891, 4, 42, 16, 1769, 28135, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1]}
[0, 42891, 4, 42, 16, 1769, 28135, 2]
{'input_ids': [[0, 42891, 4, 42, 16, 1769, 28135, 2], [0, 12196, 32, 47, 608, 116, 2]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1]]}
{'input_ids': [[42891, 4, 42, 16, 1769, 28135], [12196, 32, 47, 608, 116]], 'attention_mask': [[1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]}


---
# py 저장
----

In [13]:
#### py 파일로 Dataset 저장하기 ####
#!touch dataset.py

---
# Dataset 코드 
---

In [14]:
# from torch.utils.data import Dataset
# from transformers import RobertaTokenizer
# import csv
# from torch.utils.data import Dataset
# import torch


In [15]:
def split(session):
    final_data = []
    split_session = []
    for line in session:
        split_session.append(line)
        final_data.append(split_session[:])    
    return final_data

class data_loader(Dataset):
    def __init__(self, data_path):
      data = pd.read_csv(data_path)

      emoSet = set()
      self.tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

      self.dataset = []
      speaker_set = []
      count = 0
      for i in range(max(data['Dialogue_ID'])+1):
        new_data = data[data['Dialogue_ID'] == i]
        new_data = new_data[['Utterance','Speaker','Emotion','Dialogue_ID']]
        new_data = new_data.reset_index()
        value = []
        for j in range(len(new_data)):
          if new_data['Speaker'][j] in speaker_set:
            uniq_speaker = speaker_set.index(new_data['Speaker'][j]) # speaker의 index값 추출 
          else:
            speaker_set.append(new_data['Speaker'][j])  # 없다면 새로운 speaker 추가
            uniq_speaker = speaker_set.index(new_data['Speaker'][j]) # speaker의 index값 추출 
            
          value.append([uniq_speaker, new_data['Utterance'][j], new_data['Emotion'][j]])
        self.dataset += split(value)

            
        """ 추가 """
        self.emoList = ['anger', 'disgust', 'fear', 'joy', 'neutral', 'sadness', 'surprise']

        
    def __len__(self): # 기본적인 구성
        return len(self.dataset)
    
    def __getitem__(self, idx): # 기본적인 구성
        return self.dataset[idx]
    

    
    def padding(self, batch_input_token):
        
        ############################################################################################################################################
        ######################################################### 512 토큰 길이 넘으면 잘라내기 ##########################################################
        ############################################################################################################################################
        batch_token_ids, batch_attention_masks = batch_input_token['input_ids'], batch_input_token['attention_mask']
        trunc_batch_token_ids, trunc_batch_attention_masks = [], []

        for batch_token_id, batch_attention_mask in zip(batch_token_ids, batch_attention_masks):
            if len(batch_token_id) > self.tokenizer.model_max_length: # max_length(512) 보다 길다면 
                trunc_batch_token_id = [batch_token_id[0]] + batch_token_id[1:][-self.tokenizer.model_max_length+1:]
                                             # CLS TOken   +   문장 부터 511개 부분 까지 자르기  == 마지막 발화에 대한 감정을 인식하기 위해 뒤에서 부터 잘라주는 것이 더 바람직
                trunc_batch_attention_mask = [batch_attention_mask[0]] + batch_attention_mask[1:][-self.tokenizer.model_max_length+1:] # 마스크 동일하게 처리
                trunc_batch_token_ids.append(trunc_batch_token_id)
                trunc_batch_attention_masks.append(trunc_batch_attention_mask)
            else:
                trunc_batch_token_ids.append(batch_token_id)
                trunc_batch_attention_masks.append(batch_attention_mask)
        ############################################################################################################################################
        
        ###############################################################################################################################
        ################################################### padding token으로 패딩하기 ###################################################
        ###############################################################################################################################
        # [10, 30, 50]
        # [50, 50, 50] 
        # 50-10=40 , 50-30=20 : 패딩토큰으로 채운다. <pad>
        max_length = max([len(x) for x in trunc_batch_token_ids])
        padding_tokens, padding_attention_masks = [], []
        for batch_token_id, batch_attention_mask in zip(trunc_batch_token_ids, trunc_batch_attention_masks):
                  # 패딩 토큰 채우기
            padding_tokens.append(batch_token_id + [self.tokenizer.pad_token_id for _ in range(max_length-len(batch_token_id))])
                  # Attention Mask에도 0으로 넣기
            padding_attention_masks.append(batch_attention_mask + [0 for _ in range(max_length-len(batch_token_id))]                                                        )
            # list -> Tensor로 변환
        return torch.tensor(padding_tokens), torch.tensor(padding_attention_masks)
        ###############################################################################################################################
    
    def collate_fn(self, sessions): # 배치를 위한 구성
        '''
            input:
                data: [(session1), (session2), ... ]
            return:
                batch_input_tokens_pad: (B, L) padded
                batch_labels: (B)
        '''
        ## [발화1, 발화2, ..., 발화8]
        # 발화1~발화7 컨텍스트로 사용한다면 입력이 길어진다.
        # 발화1 같은 경우는 발화8에 덜중요할거에요.
        # 적절하게 컨텍스트 길이를 조절해도된다.
        # 3개로 정한다면, [발화5,발화6,발화7,발화8]
        """ 추가 """
        batch_input, batch_labels = [], []
        batch_PM_input = []
        for session in sessions:
            input_str = self.tokenizer.cls_token
            
            ############################################ PM ############################################
            current_speaker, current_utt, current_emotion = session[-1] # input data 중 가장 마지막 발화
            PM_input = []
            for i, line in enumerate(session):
                speaker, utt, emotion = line # 가각의 값들 할당
                input_str += " " + utt + self.tokenizer.sep_token # <CLS> + 발화 + <SEP> + ... + 발화
                if i < len(session)-1 and current_speaker == speaker: # 발화자가 같다면
                    PM_input.append(self.tokenizer.encode(utt, add_special_tokens=True, return_tensors='pt')) # 토큰화
                    # [cls_token, tokens, sep_token]
            ############################################################################################

            ############################################ CoM ###########################################
            batch_input.append(input_str)
            batch_labels.append(self.emoList.index(emotion))
            batch_PM_input.append(PM_input)
            ############################################################################################
        
        batch_input_token = self.tokenizer(batch_input, add_special_tokens=False)
        batch_padding_token, batch_padding_attention_mask = self.padding(batch_input_token)
        
        return batch_padding_token, batch_padding_attention_mask, batch_PM_input, torch.tensor(batch_labels)

In [16]:
""" 배치 결과 확인 """
dev_dataset = data_loader('./MELD/data/MELD/dev_sent_emo.csv')
dev_dataloader = DataLoader(dev_dataset, batch_size=3, shuffle=False, num_workers=4, collate_fn=dev_dataset.collate_fn)

for i, data in enumerate(dev_dataloader):
    if i == 1:
        print(data[0].shape,'\n')
        batch_padding_token, batch_padding_attention_mask, batch_PM_input, batch_label = data
        print("batch_padding_token", batch_padding_token,'\n')
        print("batch_padding_attention_mask", batch_padding_attention_mask,'\n')
        print("batch_PM_input", batch_PM_input,'\n')
        print("batch_label", batch_label,'\n')
        break

torch.Size([3, 58]) 

batch_padding_token tensor([[    0,  1793,   328,  1793,     6,    52,   115,   213,     7,     5,
           827,     6,   593,    84,  2349,     8,   847,   106,   160,    23,
             5,  1300,     4,     2,   370,    17,    27,   241,    10, 16333,
           328,     2,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1],
        [    0,  1793,   328,  1793,     6,    52,   115,   213,     7,     5,
           827,     6,   593,    84,  2349,     8,   847,   106,   160,    23,
             5,  1300,     4,     2,   370,    17,    27,   241,    10, 16333,
           328,     2,    83, 33130,     6,   313,     6,   122,    52,   351,
            17,    27,    90,    28,   827, 30489,   328,     2,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1],
        [    0,  1793,   328,  1793,

---
# py 저장
----

In [17]:
#### py 파일로 Model 저장하기 ####
#!touch model.py

---
# Model 코드
----

- 디버깅

      import pdb

      오류가 나는 code line에 pdb.set_trace()를 입력해서 cuda() 오류 확인 가능 

In [18]:
# from transformers import RobertaModel
# import torch
# import torch.nn as nn

In [19]:
class ERC_model(nn.Module):
    def __init__(self, clsNum):
        super(ERC_model, self).__init__()

        # CoM과 PM을 위해 모델을 2번 불러오기
        self.com_model = RobertaModel.from_pretrained('roberta-base')
        self.pm_model = RobertaModel.from_pretrained('roberta-base')
        
        
        ############################ PM 에서 사용되는 GRU 세팅 ############################
        self.hiddenDim = self.com_model.config.hidden_size          # 차원 설정
        zero = torch.empty(2, 1, self.hiddenDim)                    # 주어진 크기의 아무값으로도 초기화되지 않은 텐서를 만든다
        self.h0 = torch.zeros_like(zero).cuda()                         # 해당 초기 텐서에 0값 채우기 => (num_layers * num_directions, batch, hidden_size)
        self.speakerGRU = nn.GRU(self.hiddenDim, self.hiddenDim, 2, dropout=0.3) # (input, hidden, num_layer) (BERT_emb, BERT_emb, num_layer)
                                                            # 순환 레이어 수. 예를 들어, 설정 num_layers=2 은 두 개의 GRU를 함께 쌓아 쌓인 GRU 를 형성
        ##############################################################################
        
        """ score matrix """
        self.W = nn.Linear(self.hiddenDim, clsNum) # CoM과 PM의 GRU 합쳐주는 Layer

    def forward(self, batch_padding_token, batch_padding_attention_mask, batch_PM_input):
        
        ######################### for CoM #########################
        batch_com_out = self.com_model(input_ids=batch_padding_token, attention_mask=batch_padding_attention_mask)['last_hidden_state']
        batch_com_final = batch_com_out[:,0,:] # CLS Token의 Output값
        ###########################################################
        
        ############################### GRU 통과 --> PM 결과 ###############################
        batch_pm_gru_final = []
        for PM_inputs in batch_PM_input:
            if PM_inputs:
                pm_outs = []
                for PM_input in PM_inputs: # batch별 PM_input값 
                    pm_out = self.pm_model(PM_input)['last_hidden_state'][:,0,:]  # last_hidden_state는 [0]으로 슬라이싱한 것과 동일  # [:,0,:]는 CLS Token의 Output값
                    pm_outs.append(pm_out)
                pm_outs = torch.cat(pm_outs, 0).unsqueeze(1)          # (speaker_num, batch=1, hidden_dim) # dim = 0 은 Tensor가 연결되는 차원 (0은 아래로 추가)
                pm_gru_outs, _ = self.speakerGRU(pm_outs, self.h0)    # (speaker_num, batch=1, hidden_dim)
                pm_gru_final = pm_gru_outs[-1,:,:] # (1, hidden_dim)
                batch_pm_gru_final.append(pm_gru_final)
            else:
                batch_pm_gru_final.append(torch.zeros(1, self.hiddenDim).cuda())
        batch_pm_gru_final = torch.cat(batch_pm_gru_final, 0)        
        ##################################################################################
        
        """ score matrix """
        final_output = self.W(batch_com_final + batch_pm_gru_final) # (B, C)
        
        return final_output

In [20]:
from transformers import RobertaModel
model = RobertaModel.from_pretrained('roberta-base')

Downloading:   0%|          | 0.00/481 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/501M [00:00<?, ?B/s]

---
# py 저장
----

In [21]:
#### py 파일로 Train 저장하기 ####
#!touch train.py

# data_Load 부터 Train 코드 까지

---
# data Load
---

In [22]:
# import torch
# from transformers import get_linear_schedule_with_warmup
# from tqdm import tqdm

# import os
# from sklearn.metrics import precision_recall_fscore_support
# import torch.nn as nn
# import pdb

# import logging

In [23]:
train_dataset = data_loader('./MELD/data/MELD/train_sent_emo.csv')
dev_dataset = data_loader('./MELD/data/MELD/dev_sent_emo.csv')
test_dataset = data_loader('./MELD/data/MELD/test_sent_emo.csv')

train_dataloader = DataLoader(train_dataset, batch_size=1, shuffle=True, num_workers=4, collate_fn=train_dataset.collate_fn)
dev_dataloader = DataLoader(dev_dataset, batch_size=1, shuffle=False, num_workers=4, collate_fn=dev_dataset.collate_fn)
test_dataloader = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=4, collate_fn=test_dataset.collate_fn)

clsNum = len(train_dataset.emoList)
erc_model = ERC_model(clsNum).cuda()

---
# evaluation 코드
---

In [24]:
def CalACC(model, dataloader):
    model.eval() # 평가용
    correct = 0 # Accuracy 측정용 
    label_list = [] # 정답값
    pred_list = [] # 예측값
    
    # label arragne
    with torch.no_grad():
        for i_batch, data in enumerate(tqdm(dataloader)):
            batch_padding_token, batch_padding_attention_mask, batch_PM_input, batch_label = data
            batch_padding_token = batch_padding_token.cuda()
            batch_padding_attention_mask = batch_padding_attention_mask.cuda()
            
            # PM_input은 각 화자가 말했던 이전에 data를 저장해놓은 값 / 현재 list로 return됨
            batch_PM_input = [[x2.cuda() for x2 in x1] for x1 in batch_PM_input] # 리스트로 return된 값을 풀기
            batch_label = batch_label.cuda()        

            # 모델 결과값 
            pred_logits = erc_model(batch_padding_token, batch_padding_attention_mask, batch_PM_input)
            
            # 모델 예측값
            pred_label = pred_logits.argmax(1).item() # item()을 통해 int값으로 할당
            # 실제 라벨값
            true_label = batch_label.item()
            
            pred_list.append(pred_label)
            label_list.append(true_label)
            if pred_label == true_label:
                correct += 1
        acc = correct/len(dataloader)
    return acc, pred_list, label_list

---
# Loss function 코드
----

In [25]:
def CELoss(pred_outs, labels):
    loss = nn.CrossEntropyLoss()
    loss_val = loss(pred_outs, labels)
    return loss_val

In [26]:
# with torch.no_grad():
#     for i_batch, data in enumerate(tqdm(test_dataloader)):
#         batch_padding_token, batch_padding_attention_mask, batch_PM_input, batch_label = data
#         batch_padding_token = batch_padding_token.cuda()
#         batch_padding_attention_mask = batch_padding_attention_mask.cuda()
        
#         # PM_input은 각 화자가 말했던 이전에 data를 저장해놓은 값 / 현재 list로 return됨
#         batch_PM_input = [[x2.cuda() for x2 in x1] for x1 in batch_PM_input] # 리스트로 return된 값을 풀기
#         batch_label = batch_label.cuda()        

#         # 모델 결과값 
#         pred_logits = erc_model(batch_padding_token, batch_padding_attention_mask, batch_PM_input)
#         aa = pred_logits
#         print(pred_logits.argmax(1).item())

#         break

---
# Save 코드
----

In [27]:
def SaveModel(model, path):
    if not os.path.exists(path):
        os.makedirs(path)
    torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss_val,
            }, os.path.join(path, 'model.bin'))

In [28]:
########################### 모델 불러오기 ###########################
# checkpoint = torch.load(PATH)
# model.load_state_dict(checkpoint['model_state_dict'])
# optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
# epoch = checkpoint['epoch']
# loss = checkpoint['loss']

---
# Hyper parameters 설정
----

In [29]:
training_epochs = 3
max_grad_norm = 10
lr = 1e-6
num_training_steps = len(train_dataset)*training_epochs
num_warmup_steps = len(train_dataset)
optimizer = torch.optim.AdamW(erc_model.parameters(), lr=lr) # , eps=1e-06, weight_decay=0.01
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps)

---
# Train 코드
----

In [39]:
#################### 로그 생성 ######################
# 파라미터로 아무것도 넣어주지 않으면 root logger가 생성되고, 로거의 이름을 넣어주면 해당 이름으로 생성
###################################################
logger = logging.getLogger()

################### 로그의 출력 기준 설정 ###################
# 이벤트의 심각도는 DEBUG, INFO, WARNING, ERROR, WARNING 순
# 기본 설정은 WARNING / INFO로 설정한 경우 log도 출력으로 확인가능
########################################################
logger.setLevel(logging.INFO)

###################### log 출력 형식 ######################
#  저장될 log의 포맷을 의미
# %(asctime)s - %(name)s - %(levelname)s - %(message)s
# 각각 시간, logger의 이름, INFO/ERROR 등의 log레벨의 종류, 메시지
#########################################################
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

####################### log 출력 #######################
# log를 출력할 stream_handler와 file_handler를 각각 선언
# file_handler는 파일에 쓰는 것이기 때문에 원하는 파일명을 파라미터로 넣어주면 됨
# 또한 위에서 선언한 formatter를 각각 적용
#######################################################
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)

######################## log를 파일에 출력 ########################
file_handler = logging.FileHandler('erc.log')
logger.addHandler(file_handler)

logger.info("############학습 시작############")


best_dev_fscore = 0
save_path = '.'
for epoch in tqdm(range(training_epochs)):
    erc_model.train() 
    for i_batch, data in enumerate(tqdm(train_dataloader)):
        batch_padding_token, batch_padding_attention_mask, batch_PM_input, batch_label = data
        batch_padding_token = batch_padding_token.cuda()
        batch_padding_attention_mask = batch_padding_attention_mask.cuda()
        batch_PM_input = [[x2.cuda() for x2 in x1] for x1 in batch_PM_input]
        batch_label = batch_label.cuda()        
        
        # 모델 결과값
        pred_logits = erc_model(batch_padding_token, batch_padding_attention_mask, batch_PM_input)
        
        # 모델 결과값 
        loss_val = CELoss(pred_logits, batch_label)
        
        loss_val.backward()
        torch.nn.utils.clip_grad_norm_(erc_model.parameters(), max_grad_norm)
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()
    
    #Dev & Test evaluation
    erc_model.eval()
    
    dev_acc, dev_pred_list, dev_label_list = CalACC(erc_model, dev_dataloader)
    dev_pre, dev_rec, dev_fbeta, _ = precision_recall_fscore_support(dev_label_list, dev_pred_list, average='weighted')
    
    logger.info("Dev W-avg F1: {}".format(dev_fbeta))
    
    test_acc, test_pred_list, test_label_list = CalACC(erc_model, test_dataloader)
    test_pre, test_rec, test_fbeta, _ = precision_recall_fscore_support(test_label_list, test_pred_list, average='weighted')                
    
    # Best Score & Model Save
    if dev_fbeta > best_dev_fscore:
        best_dev_fscore = dev_fbeta

        SaveModel(erc_model, save_path)
        logger.info("Epoch:{}, Test W-avg F1: {}".format(epoch, test_fbeta))

INFO:root:############학습 시작############
2022-12-07 13:29:35,674 - root - INFO - ############학습 시작############
2022-12-07 13:29:35,674 - root - INFO - ############학습 시작############


  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/9989 [00:00<?, ?it/s]

  0%|          | 0/1109 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
INFO:root:Dev W-avg F1: 0.5730913355462027
2022-12-07 14:05:07,422 - root - INFO - Dev W-avg F1: 0.5730913355462027
2022-12-07 14:05:07,422 - root - INFO - Dev W-avg F1: 0.5730913355462027


  0%|          | 0/2610 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (522 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (547 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (560 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (597 > 512). Running this sequence through the model will result in indexing errors
  _warn_prf(average, modifier, msg_start, len(result))
INFO:root:Epoch:0, Test W-avg F1: 0.6063340831825355
2022-12-07 14:06:58,166 - root - INFO - Epoch:0, Test W-avg F1: 0.6063340831825355
2022-12-07 14:06:58,166 - root - INFO - Epoch:0, Test W-avg F1: 0.606334083182535

  0%|          | 0/9989 [00:00<?, ?it/s]

  0%|          | 0/1109 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
INFO:root:Dev W-avg F1: 0.5730913355462027
2022-12-07 14:42:24,532 - root - INFO - Dev W-avg F1: 0.5730913355462027
2022-12-07 14:42:24,532 - root - INFO - Dev W-avg F1: 0.5730913355462027


  0%|          | 0/2610 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (522 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (547 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (560 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (597 > 512). Running this sequence through the model will result in indexing errors
  _warn_prf(average, modifier, msg_start, len(result))


  0%|          | 0/9989 [00:00<?, ?it/s]

  0%|          | 0/1109 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
INFO:root:Dev W-avg F1: 0.5730913355462027
2022-12-07 15:19:35,256 - root - INFO - Dev W-avg F1: 0.5730913355462027
2022-12-07 15:19:35,256 - root - INFO - Dev W-avg F1: 0.5730913355462027


  0%|          | 0/2610 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (522 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (547 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (560 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (597 > 512). Running this sequence through the model will result in indexing errors
  _warn_prf(average, modifier, msg_start, len(result))


---
# Erorr Sample
---

In [31]:
def ErrorSamples(model, dataloader):
    model.eval()
    correct = 0
    label_list = []
    pred_list = []    
    
    error_samples = []
    # label arragne
    with torch.no_grad():
        for i_batch, data in enumerate(tqdm(dataloader)):
            """Prediction"""
            batch_padding_token, batch_padding_attention_mask, batch_PM_input, batch_label = data
            batch_padding_token = batch_padding_token.cuda()
            batch_padding_attention_mask = batch_padding_attention_mask.cuda()
            batch_PM_input = [[x2.cuda() for x2 in x1] for x1 in batch_PM_input]
            batch_label = batch_label.cuda()        

            """Prediction"""
            pred_logits = erc_model(batch_padding_token, batch_padding_attention_mask, batch_PM_input)
            
            """Calculation"""    
            pred_label = pred_logits.argmax(1).item()
            true_label = batch_label.item()
            
            pred_list.append(pred_label)
            label_list.append(true_label)            
            if pred_label != true_label:
                error_samples.append([batch_padding_token, true_label, pred_label])
            if pred_label == true_label:
                correct += 1
        acc = correct/len(dataloader)                
    return error_samples, acc, pred_list, label_list

In [32]:
error_samples, acc, pred_list, label_list = ErrorSamples(erc_model, test_dataloader)

  0%|          | 0/2610 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (522 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (547 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (560 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (597 > 512). Running this sequence through the model will result in indexing errors


In [37]:
import random
random_error_samples = random.sample(error_samples, 5)

In [38]:
for random_error_sample in random_error_samples:
    batch_padding_token, true_label, pred_label = random_error_sample
    print('--------------------------------------------------------')
    print("입력 문장들: ", test_dataset.tokenizer.decode(batch_padding_token.squeeze(0).tolist()))
    print("정답 감정: ", test_dataset.emoList[true_label])
    print("예측 감정: ", test_dataset.emoList[pred_label])

--------------------------------------------------------
입력 문장들:  <s> As bad as that went, I actually enjoyed myself. I think that I’m going to apologize for all of the stupid things I do.</s> Why don’t you just stop doing stupid things? Then you wouldn’t have to apologize.</s> I would really love it if I could do both.</s> All right, I…I have to ask.</s> What?</s> Are you gonna break up with me if I get fat again?</s> What?!</s> Well, you broke up with Julie Grath! How much weight could she have gained?</s> A hundred and forty-five pounds.</s>
정답 감정:  disgust
예측 감정:  neutral
--------------------------------------------------------
입력 문장들:  <s> It’s so secluded up here.</s> I know. I like it up here.</s> I feel like we’re the only two people in the world.  Oops. Sorry.</s> What’s the matter honey? Did you see a little mouse?</s> No-no! Big bear! Big bear outside! I think I-I—would you—actually, would you go check on that?</s> Honey, we don’t have any bears here.</s> Well, okay. Would-w