# TRADE 실습
😁 본 노트북은 TRADE를 한 줄 씩 실행하며 Input과 Output의 전체적인 Flow를 학습하기 위한 것입니다. 이를 통해 DST TRADE Model Task에 대해 올바른 이해를 할 수 있습니다.

### 1. TRADE_preprocessor
convert examples to features (input_ids, segment_ids, target_ids, slot_meta) 구현

In [101]:
import json
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import DataLoader
from transformers import BertModel, BertTokenizer, AutoTokenizer, AutoModel
from data_utils import *
from preprocessor import *
from tqdm import tqdm

In [12]:
cd code

/opt/ml/code


In [18]:
#-> json 파일 불러와서 DSTInputExample 객체 생성
train_data, dev_data, dev_label = load_dataset("/opt/ml/input/data/train_dataset/train_dials.json")
with open("/opt/ml/input/data/train_dataset/slot_meta.json") as f:
    slot_meta = json.load(f)
with open("/opt/ml/input/data/train_dataset/ontology.json") as f:
    ontology = json.load(f)
tokenizer = AutoTokenizer.from_pretrained("dsksd/bert-ko-small-minimal")
    
#대화 2개 추출, 단 턴의 길이는 각자 다름
train_data = train_data[0:2]

#대화를 턴 별로 DSTInputExample 객체에 맞게 추출
train_example = get_examples_from_dialogues(train_data)

#tokenizing
processor = TRADEPreprocessor(slot_meta, tokenizer)
train_dataset = processor.convert_examples_to_features(train_example)
print(f"현재 {len(train_data)} 개의 주제에 대해 각각 턴으로 추출한 총 대화데이터셋 개수 : {len(train_dataset)} 개")
train_dataset

100%|██████████| 2/2 [00:00<00:00, 3671.16it/s]

Word drop: 0.0
현재 2 개의 주제에 대해 각각 턴으로 추출한 총 대화데이터셋 개수 : 12 개





In [112]:
'''
==> input ids
    턴과 턴 사이에는 [SEP] 토큰
    tokenizer.encode
    max_length 길이 맞추고 마지막에 [CLS] {input_ids} [SEP] 로 만듬
'''
#train_dataset = train_features
train_features = []
for example in tqdm(train_example):
    # 턴과 턴 사이에 [SEP]
    dialogue_context = " [SEP] ".join(example.context_turns + example.current_turn)
    # tokenizer.encode
    input_ids = tokenizer.encode(dialogue_context, add_special_tokens=False)
    # max_length 길이 맞추고 마지막에 [CLS] {input_ids} [SEP] 로 만듬
    max_seq_length = 512 - 2 #마지막에 [CLS], [SEP] 붙일 것을 빼줌

    # 만약 max_seq_length 보다 input_ids가 크면 왼쪽부터 truncate. 왼쪽이 오래된 발화이기 때문이다.
    if len(input_ids) > max_seq_length:
        gap_idx = len(input_ids) - max_seq_length
        input_ids = input_ids[gap_idx:]
    # 앞 뒤에 토큰 붙여주기
    input_ids = [tokenizer.cls_token_id] + input_ids + [tokenizer.sep_token_id]
    '''
    ==> segment ids
    bert tokenizer를 사용할 때, 문장을 구분해주고자 할 때 쓰는 id 이다.
    '''
    firt_sep_idx = input_ids.index(tokenizer.sep_token_id)
    segment_id = [0] * len(input_ids[: firt_sep_idx + 1]) + [1] * len(
        input_ids[firt_sep_idx + 1 :]
    )
    '''
    ==> target id
    example의 label(domain-slot-value)을 dict("domain-slot":"value")로 변환
    slot meta 전체 slot에 대하여 해당 state의 value 값을 업데이트, 없으면 0
    tokenizer.encode
    뒤에 [SEP] 토큰 부착
    길이를 같게 만들어 주기 위해서 max_length 만큼 [PAD] 토큰 부착
    '''
    state = convert_state_dict(example.label)
    gate_ids = []
    target_ids = []
    for slot in slot_meta:
        #slot meta의 slot에서 현재 state의 값이 있으면 해당 슬롯의 value를 가져오고 없으면 none
        value = state.get(slot, "none")
        #tokenizer.encode
        target_id = tokenizer.encode(value, add_special_tokens=False)
        target_id += [tokenizer.sep_token_id]
        '''
        ==> gate id
        target id의 value를 gating2id를 통해 index로 변환
        '''
        gating2id = {"none":0, "dontcare":1, "yes":2, "no":3, "ptr":4}
        #전체 gate_ids, target_ids에 추가
        gate_ids.append(gating2id.get(value, gating2id["ptr"]))
        target_ids.append(target_id)
    # max length 만큼 [PAD] 부착
    max_length = max(list(map(len, target_ids)))
    target_ids = [target_id + [tokenizer.pad_token_id]*(max_length-len(target_id)) for target_id in target_ids]
    
    #OpenVocabDSTFeature(guid, input_ids, segment_ids, gating_ids, target_ids) 총 5개의 데이터 구성
    train_features.append(OpenVocabDSTFeature(example.guid, input_ids, segment_ids, gate_ids, target_ids))
print(f"총 {len(train_features)} 개의 features가 있습니다")
train_features

100%|██████████| 12/12 [00:00<00:00, 178.49it/s]

총 12 개의 features가 있습니다





[OpenVocabDSTFeature(guid='snowy-hat-8324:관광_식당_11-0', input_id=[2, 3, 6265, 6672, 4073, 3249, 4034, 8732, 4292, 6722, 4076, 8553, 3], segment_id=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], gating_id=[0, 0, 0, 0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], target_ids=[[21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [8732, 3, 0], [21832, 11764, 3], [6265, 6672, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [21832, 11764, 3], [218

### 2. TRADE Model Architecture
TRADE 모듈은 총 3가지로 구성되어 있다.
- Utterance encoder - Bidirection GRU based encoder(Bert 가능)
- Slot(State) generator - GRU based decoder(transformer decoder 가능)
- Slot gate - pointer generator

In [95]:
#Utterance encoder
class GRUEncoder(nn.Module):
    def __init__(self, vocab_size, d_model, n_layer, dropout, proj_dim=None, pad_idx=0):
        super(GRUEncoder, self).__init__()
        self.pad_idx = pad_idx
        self.embed = nn.Embedding(vocab_size, d_model, padding_idx=pad_idx)
        if proj_dim:
            self.proj_layer = nn.Linear(d_model, proj_dim, bias=False)
        else:
            self.proj_layer = None

        self.d_model = proj_dim if proj_dim else d_model
        self.gru = nn.GRU(
            self.d_model,
            self.d_model,
            n_layer,
            dropout=dropout,
            batch_first=True,
            bidirectional=True,
        )
        self.dropout = nn.Dropout(dropout)

    def forward(self, input_ids):
        mask = input_ids.eq(self.pad_idx).unsqueeze(-1)
        x = self.embed(input_ids)
        if self.proj_layer:
            x = self.proj_layer(x)
        x = self.dropout(x)
        o, h = self.gru(x)
        o = o.masked_fill(mask, 0.0)
        # bidirectional 이라 두개 이어주는거
        output = o[:, :, : self.d_model] + o[:, :, self.d_model :]
        hidden = h[0] + h[1]  # n_layer 고려
        return output, hidden

In [137]:
class SlotGenerator(nn.Module):
    def __init__(
        self, vocab_size, hidden_size, dropout, n_gate, proj_dim=None, pad_idx=0
    ):
        super(SlotGenerator, self).__init__()
        self.pad_idx = pad_idx
        self.vocab_size = vocab_size
        self.embed = nn.Embedding(
            vocab_size, hidden_size, padding_idx=pad_idx
        )  # shared with encoder

        if proj_dim:
            self.proj_layer = nn.Linear(hidden_size, proj_dim, bias=False)
        else:
            self.proj_layer = None
        self.hidden_size = proj_dim if proj_dim else hidden_size

        self.gru = nn.GRU(
            self.hidden_size, self.hidden_size, 1, dropout=dropout, batch_first=True
        )
        self.n_gate = n_gate
        self.dropout = nn.Dropout(dropout)
        self.w_gen = nn.Linear(self.hidden_size * 3, 1)
        self.sigmoid = nn.Sigmoid()
        self.w_gate = nn.Linear(self.hidden_size, n_gate)

    def set_slot_idx(self, slot_vocab_idx):
        whole = []
        max_length = max(map(len, slot_vocab_idx))
        for idx in slot_vocab_idx:
            if len(idx) < max_length:
                gap = max_length - len(idx)
                idx.extend([self.pad_idx] * gap)
            whole.append(idx)
        self.slot_embed_idx = whole  # torch.LongTensor(whole)

    def embedding(self, x):
        x = self.embed(x)
        if self.proj_layer:
            x = self.proj_layer(x)
        return x

    def forward(
        self, input_ids, encoder_output, hidden, input_masks, max_len, teacher=None
    ):
        print(input_masks.shape)
        input_masks = input_masks.ne(1) #input mask 반전
        print(input_masks.shape) # 
        # J, slot_meta : key : [domain, slot] ex> LongTensor([1,2])
        # J,2
        batch_size = encoder_output.size(0)
        slot = torch.tensor(
            self.slot_embed_idx, device=input_ids.device, dtype=torch.int64
        )  ##
        slot_e = torch.sum(self.embedding(slot), 1)  # J,d
        J = slot_e.size(0)

        all_point_outputs = torch.zeros(
            batch_size, J, max_len, self.vocab_size, device=input_ids.device
        )

        # Parallel Decoding
        w = slot_e.repeat(batch_size, 1).unsqueeze(1)
        hidden = hidden.repeat_interleave(J, dim=1)
        encoder_output = encoder_output.repeat_interleave(J, dim=0)
        input_ids = input_ids.repeat_interleave(J, dim=0)
        input_masks = input_masks.repeat_interleave(J, dim=0)
        for k in range(max_len):
            w = self.dropout(w)
            _, hidden = self.gru(w, hidden)  # 1,B,D

            # B,T,D * B,D,1 => B,T
            attn_e = torch.bmm(encoder_output, hidden.permute(1, 2, 0))  # B,T,1
            attn_e = attn_e.squeeze(-1).masked_fill(input_masks, -1e4)
            attn_history = F.softmax(attn_e, -1)  # B,T

            if self.proj_layer:
                hidden_proj = torch.matmul(hidden, self.proj_layer.weight)
            else:
                hidden_proj = hidden

            # B,D * D,V => B,V
            attn_v = torch.matmul(
                hidden_proj.squeeze(0), self.embed.weight.transpose(0, 1)
            )  # B,V
            attn_vocab = F.softmax(attn_v, -1)

            # B,1,T * B,T,D => B,1,D
            context = torch.bmm(attn_history.unsqueeze(1), encoder_output)  # B,1,D
            p_gen = self.sigmoid(
                self.w_gen(torch.cat([w, hidden.transpose(0, 1), context], -1))
            )  # B,1
            p_gen = p_gen.squeeze(-1)

            p_context_ptr = torch.zeros_like(attn_vocab, device=input_ids.device)
            p_context_ptr.scatter_add_(1, input_ids, attn_history)  # copy B,V
            p_final = p_gen * attn_vocab + (1 - p_gen) * p_context_ptr  # B,V
            _, w_idx = p_final.max(-1)

            if teacher is not None:
                w = (
                    self.embedding(teacher[:, :, k])
                    # .transpose(0, 1)
                    .reshape(batch_size * J, 1, -1)
                )
            else:
                w = self.embedding(w_idx).unsqueeze(1)  # B,1,D
            if k == 0:
                gated_logit = self.w_gate(context.squeeze(1))  # B,3
                all_gate_outputs = gated_logit.view(batch_size, J, self.n_gate)
            all_point_outputs[:, :, k, :] = p_final.view(batch_size, J, self.vocab_size)

        return all_point_outputs, all_gate_outputs

In [169]:
vocab_size = len(tokenizer)
hidden_dim = 768
n_layer = 1
dropout = 0.1
n_gate = 5
max_len = 512
#dataset 형식으로 바꾸어줌
train_data = WOSDataset(train_features)
train_loader = DataLoader(
        train_data,
        batch_size=1,
        collate_fn=processor.collate_fn,
    )
for batch in train_loader:
    #word2vec 과 char을 쓰는 것이 아닌, torch embedding 사용
    input_ids, segment_ids, input_masks, gating_ids, target_ids, guids = batch
    utterance_encoder = GRUEncoder(vocab_size, hidden_dim, n_layer, dropout)
    output, hidden = utterance_encoder(input_ids) # 1 X encoder_max_seq_len X hidden_dim (1 X 512 X 768)
    
    #Slot generator
    '''
    ==> tokenized slot meta
    tokenizer.encode
    '''
    tokenized_slot_meta = []
    for slot in slot_meta:
        tokenized_slot_meta.append(
            #하이푼을 제거하고 tokenizer.encdoing
            tokenizer.encode(slot.replace("-", " "), add_special_tokens=False)
        )
    slot_generator = SlotGenerator(vocab_size, hidden_dim, dropout, n_gate)
    slot_generator.set_slot_idx(tokenized_slot_meta)
    slot_generator.embed.weight = utterance_encoder.embed.weight
    if slot_generator.proj_layer:
        slot_generator.proj_layer.weight = utterance_encoder.proj_layer.weight
    
    slot = torch.LongTensor(slot_generator.slot_embed_idx).to(input_ids.device) # J X emb_dim
    print(f"J X emb_dim : {slot.shape}")
    slot_e = torch.sum(slot_generator.embed(slot), 1) # J X hidden_dim
    print(f"J X hidden_dim : {slot_e.shape}")
    J = slot_e.size(0)
    #print(slot_generator(input_ids, output, hidden.unsqueeze(0), input_masks, target_ids.size(-1), None))
    
    #zero tensor initialize
    batch_size = 1
    all_point_outputs = torch.zeros(J, batch_size, max_len, vocab_size).to(input_ids.device)
    all_gate_outputs = torch.zeros(J, batch_size, n_gate).to(input_ids.device)
    
    for j in range(J):
        print(slot_e[j].shape) # hidden
        w = slot_e[j].expand(batch_size, 1, hidden_dim) #b X 1 X hidden
        print(w.shape)
        slot_value = []
        for k in range(max_len):
            w = 
    break

  "num_layers={}".format(dropout, num_layers))


J X emb_dim : torch.Size([45, 4])
J X hidden_dim : torch.Size([45, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
torch.Size([768])
torch.Size([1, 1, 768])
tor