# char-RNN-文本生成
## 教學目標
使用 RNN 弄出一個基本的生成文字模型，幫助初學者上手 RNN

## 適用對象
適用於已經學過 PyTorch 基本語法的人

## 執行方法
在 Jupyter notebook 中，選取想要執行的區塊後，使用以下其中一種方法執行

- 上方工具列中，按下 Cell < Run Cells 執行
- 使用快捷鍵 Shift + Enter 執行

## 大綱
- [載入資料](#載入資料)
- [前處理](#前處理)
- [建立字典](#建立字典)
- [超參數](#超參數)
- [資料分批](#資料分批)
- [模型設計](#模型設計)
- [訓練](#訓練)
- [生成](#生成)

## 檔案來源
- [Kaggle HC 新聞資料集](https://www.kaggle.com/alvations/old-newspapers#old-newspaper.tsv)
- 下載後請放到路徑 `專案資料夾/data/old-newspaper.tsv`

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [3]:
!pip install opencc

Collecting opencc
  Downloading OpenCC-1.1.7-cp310-cp310-manylinux1_x86_64.whl (779 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m779.8/779.8 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: opencc
Successfully installed opencc-1.1.7


In [4]:
import numpy as np
import pandas as pd
import torch
import torch.nn
import torch.nn.utils.rnn
import torch.utils.data
import matplotlib.pyplot as plt
import seaborn as sns
import os
import opencc

data_path = './gdrive/MyDrive/ikm_lab/GAI/data'

# 載入資料
- 請務必先[下載](https://www.kaggle.com/alvations/old-newspapers#old-newspaper.tsv)資料後將資料放置到 `data` 資料夾之下
- `tsv` 檔案類似 `csv`，只是用 `\t` 做分隔符號
- 資料內容包含

|欄位|意義|資料型態|
|-|-|-|
|`Language`|語系|文字（類別）|
|`Source`|新聞來源|文字|
|`Date`|時間|文字|
|`Text`|文字內容|文字|

In [5]:
df = pd.read_csv(os.path.join(data_path + '/arithmetic.csv'))
# 看一下前幾筆資料是什麼樣子
# df.head()

# 前處理
- 訓練目標為生成繁體中文字
    - 所以只考量繁體中文的資料
    - 類別為 `Chinese (Traditional)`
    - 共約 333735 筆
- 資料長度不一
    - 畫出長度分佈圖
    - 計算長度四分位數、最小值、最大值
    - 為了方便訓練，只考慮長度介於 60~200 的新聞

In [6]:
# df[df['Language'] == 'Chinese (Traditional)'].shape
df.shape

(2632500, 2)

In [7]:
# 只取前7000筆，因爲原資料量太大了，不方便演示
# df = df[df['Language'] == 'Chinese (Traditional)'].iloc[:7000]
# df['src'].iloc[:]

# df = df.iloc[:26325]
df['src'] = df['src'].astype(str)
df['tgt'] = df['tgt'].astype(str)
df['combined'] = df['src'].str.cat(df['tgt'])


In [8]:
df.shape

(2632500, 3)

# 建立字典
- 無法直接利用純文字進行計算
- 將所有文字轉換成數字
- 字典大小約為 `7000`
- 特殊字
    - '&lt;pad&gt;'
        - 每個 batch 所包含的句子長度不同
        - 將長度使用 '&lt;pad&gt;' 補成 batch 中最大值者
    - '&lt;eos&gt;'
        - 指定生成的結尾
        - 沒有 '&lt;eos&gt;' 會不知道何時停止生成

In [9]:
# 一個dict把中文字符轉化成id
char_to_id = {}
# 把id轉回中文字符
id_to_char = {}

# 有一些必須要用的special token先添加進來(一般用來做padding的token的id是0)
char_to_id['<pad>'] = 0
char_to_id['<eos>'] = 1
id_to_char[0] = '<pad>'
id_to_char[1] = '<eos>'


# 把所有資料集中出現的token都記錄到dict中
for char in set(df['combined'].str.cat()):
    ch_id = len(char_to_id)
    char_to_id[char] = ch_id
    id_to_char[ch_id] = char

vocab_size = len(char_to_id)
print('字典大小: {}'.format(vocab_size))

字典大小: 18


In [10]:
# # 把資料集的所有資料都變成id
df['char_id_list'] = df['combined'].apply(lambda text: [char_to_id[char] for char in list(text)] + [char_to_id['<eos>']])
df['ans_id_list'] = df['tgt'].apply(lambda text: [char_to_id[char] for char in list(text)] + [char_to_id['<eos>']])
# df[['src','tgt','combined','char_id_list','ans_id_list']].head()

In [11]:
brackets_df = pd.DataFrame(columns=['src', 'tgt','combined','char_id_list','ans_id_list'])
no_brackets_df = pd.DataFrame(columns=['src', 'tgt','combined','char_id_list','ans_id_list'])

positive_df = pd.DataFrame(columns=['src', 'tgt','combined','char_id_list','ans_id_list'])
negative_df = pd.DataFrame(columns=['src', 'tgt','combined','char_id_list','ans_id_list'])
total_df = pd.DataFrame(columns=['src', 'tgt','combined','char_id_list','ans_id_list'])


first_df = df.iloc[:10000]
last_df = df.iloc[2622500:]

for src,tgt,combined,char_id,ans_id in zip(first_df['src'],first_df['tgt'],first_df['combined'],first_df['char_id_list'],first_df['ans_id_list']):

  total_df.loc[len(total_df)] = [src,tgt,combined,char_id,ans_id]
  if '(' in src:
    brackets_df.loc[len(brackets_df)] = [src,tgt,combined,char_id,ans_id]
  else :
    no_brackets_df.loc[len(no_brackets_df)] = [src,tgt,combined,char_id,ans_id]
  if '-' in tgt:
    negative_df.loc[len(negative_df)] = [src,tgt,combined,char_id,ans_id]
  else  :
    positive_df.loc[len(positive_df)] = [src,tgt,combined,char_id,ans_id]
for src,tgt,combined,char_id,ans_id in zip(last_df['src'],last_df['tgt'],last_df['combined'],last_df['char_id_list'],last_df['ans_id_list']):
  total_df.loc[len(total_df)] = [src,tgt,combined,char_id,ans_id]
  if '(' in src:
    brackets_df.loc[len(brackets_df)] = [src,tgt,combined,char_id,ans_id]
  else :
    no_brackets_df.loc[len(no_brackets_df)] = [src,tgt,combined,char_id,ans_id]
  if '-' in tgt:
    negative_df.loc[len(negative_df)] = [src,tgt,combined,char_id,ans_id]
  else  :
    positive_df.loc[len(positive_df)] = [src,tgt,combined,char_id,ans_id]


# 超參數

|超參數|意義|數值|
|-|-|-|
|`batch_size`|單一 batch 的資料數|64|
|`epochs`|總共要訓練幾個 epoch|10|
|`embed_dim`|文字的 embedding 維度|50|
|`hidden_dim`|LSTM 中每個時間的 hidden state 維度|50|
|`lr`|Learning Rate|0.001|
|`grad_clip`|為了避免 RNN 出現梯度爆炸問題，將梯度限制範圍|1|

In [12]:
batch_size = 1000
epochs = 3
embed_dim = 256
hidden_dim = 256
lr = 0.001
grad_clip = 1

# 資料分批
- 使用 `torch.utils.data.Dataset` 建立資料產生的工具 `dataset`
- 再使用 `torch.utils.data.DataLoader` 對資料集 `dataset` 隨機抽樣並作為一個 batch


In [13]:
# 這裏的dataset是文本生成的dataset，輸入和輸出的資料都是文章
# 舉個例子，現在的狀況是：
# input:  A B C D E F
# output: B C D E F <eos>
# 而對於加減法的任務：
# input:  1 + 2 + 3 = 6
# output: / / / / / 6 <eos>
# /的部分都不用算loss，主要是預測=的後面，這裏的答案是6，所以output是6 <eos>
class Dataset(torch.utils.data.Dataset):
    def __init__(self, equation, answer):
        self.equation = equation
        self.answer = answer

    def __getitem__(self, index):
        # input:  A B C D E F
        # output: B C D E F <eos>
        x = self.equation.iloc[index][:]
        y = self.answer.iloc[index][:]
        return x, y

    def __len__(self):
        return len(self.equation)

def collate_fn(batch):
    batch_x = [torch.tensor(data[0]) for data in batch] # list[torch.tensor]
    batch_y = [torch.tensor(data[1]) for data in batch] # list[torch.tensor]
    batch_x_lens = torch.LongTensor([len(x) for x in batch_x])
    batch_y_lens = torch.LongTensor([len(y) for y in batch_y])

    pad_batch_x = torch.nn.utils.rnn.pad_sequence(batch_x,
                            batch_first=True, # shape=(batch_size, seq_len)
                            padding_value=char_to_id['<pad>'])

    pad_batch_y = torch.nn.utils.rnn.pad_sequence(batch_y,
                            batch_first=True, # shape=(batch_size, seq_len)
                            padding_value=char_to_id['<pad>'])

    return pad_batch_x, pad_batch_y, batch_x_lens, batch_y_lens


In [14]:
math_dataset = Dataset(df['char_id_list'].iloc[263000:283000],df['ans_id_list'].iloc[263000:283000])
validation_dataset = Dataset(df['char_id_list'].iloc[226364:226428],df['ans_id_list'].iloc[226364:226428])
bracket_dataset = Dataset(brackets_df['char_id_list'],brackets_df['ans_id_list'])
no_brackets_dataset = Dataset(no_brackets_df['char_id_list'],no_brackets_df['ans_id_list'])
positive_dataset = Dataset(positive_df['char_id_list'],positive_df['ans_id_list'])
negative_dataset = Dataset(negative_df['char_id_list'],negative_df['ans_id_list'])
total_dataset = Dataset(total_df['char_id_list'],total_df['ans_id_list'])

In [15]:
train_data_loader = torch.utils.data.DataLoader(math_dataset,
                      batch_size=batch_size,
                      shuffle=True,
                      collate_fn=collate_fn)
train_brackets_data_loader = torch.utils.data.DataLoader(bracket_dataset,
                      batch_size=batch_size,
                      shuffle=True,
                      collate_fn=collate_fn)
train_no_brackets_data_loader = torch.utils.data.DataLoader(no_brackets_dataset,
                      batch_size=batch_size,
                      shuffle=True,
                      collate_fn=collate_fn)
train_positive_data_loader = torch.utils.data.DataLoader(positive_dataset,
                      batch_size=batch_size,
                      shuffle=True,
                      collate_fn=collate_fn)
train_negative_data_loader = torch.utils.data.DataLoader(negative_dataset,
                      batch_size=batch_size,
                      shuffle=True,
                      collate_fn=collate_fn)
train_total_data_loader = torch.utils.data.DataLoader(total_dataset,
                      batch_size=batch_size,
                      shuffle=True,
                      collate_fn=collate_fn)
validation_data_loader = torch.utils.data.DataLoader(validation_dataset,
                      batch_size=batch_size,
                      shuffle=True,
                      collate_fn=collate_fn)

# 模型設計

## 執行順序
1. 將句子中的所有字轉換成 embedding
2. 按照句子順序將 embedding 丟入 LSTM
3. LSTM 的輸出再丟給 LSTM，可以接上更多層
4. 最後的 LSTM 所有時間點的輸出丟進一層 Fully Connected
5. 輸出結果所有維度中的最大者即為下一個字

## 損失函數
因為是類別預測，所以使用 Cross Entropy

## 梯度更新
使用 Adam 演算法進行梯度更新

In [16]:
class CharRNN(torch.nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim):
        super(CharRNN, self).__init__()

        # Embedding層
        self.embedding = torch.nn.Embedding(num_embeddings=vocab_size,
                                            embedding_dim=embed_dim,
                                            padding_idx=char_to_id['<pad>'])

        # RNN層
        self.rnn_layer1 = torch.nn.LSTM(input_size=embed_dim,
                                        hidden_size=hidden_dim,
                                        batch_first=True)

        self.rnn_layer2 = torch.nn.LSTM(input_size=hidden_dim,
                                  hidden_size=hidden_dim,
                                  batch_first=True)

        # output層
        self.linear = torch.nn.Sequential(torch.nn.Linear(in_features=hidden_dim,
                                  out_features=hidden_dim),
                                  torch.nn.ReLU(),
                                  torch.nn.Linear(in_features=hidden_dim,
                                  out_features=vocab_size))

    def forward(self, batch_x, batch_x_lens):
        return self.encoder(batch_x, batch_x_lens)

    def encoder(self, batch_x, batch_x_lens):
        batch_x = self.embedding(batch_x)
        batch_x = torch.nn.utils.rnn.pack_padded_sequence(batch_x,
                                                          batch_x_lens,
                                                          batch_first=True,
                                                          enforce_sorted=False)

        batch_x, _ = self.rnn_layer1(batch_x)
        batch_x, _ = self.rnn_layer2(batch_x)

        batch_x, _ = torch.nn.utils.rnn.pad_packed_sequence(batch_x,
                                                            batch_first=True)

        batch_x = self.linear(batch_x)

        return batch_x

    def generator(self, start_char, max_len=200):

        char_list = [char_to_id[start_char]]

        next_char = None

        # 生成的長度沒達到max_len就一直生
        while len(char_list) < max_len:
            x = torch.LongTensor(char_list).unsqueeze(0)
            x = self.embedding(x)
            _, (ht, _) = self.rnn_layer1(x)
            _, (ht, _) = self.rnn_layer2(ht)
            y = self.linear(ht)

            next_char = np.argmax(y.numpy())
            # 如果看到新的token是<eos>就說明生成結束了，就停下
            if next_char == char_to_id['<eos>']:
                break

            char_list.append(next_char)

        return [id_to_char[ch_id] for ch_id in char_list]

In [17]:
torch.manual_seed(2)
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

model = CharRNN(vocab_size,
                embed_dim,
                hidden_dim)

In [18]:
criterion = torch.nn.CrossEntropyLoss(ignore_index=char_to_id['<pad>'], reduction='mean')
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

# 訓練
1. 最外層的 `for` 迴圈控制 `epoch`
    1. 內層的 `for` 迴圈透過 `data_loader` 取得 batch
        1. 丟給 `model` 進行訓練
        2. 預測結果 `batch_pred_y` 跟真正的答案 `batch_y` 進行 Cross Entropy 得到誤差 `loss`
        3. 使用 `loss.backward` 自動計算梯度
        4. 使用 `torch.nn.utils.clip_grad_value_` 將梯度限制在 `-grad_clip` &lt; &lt; `grad_clip` 之間
        5. 使用 `optimizer.step()` 進行更新（back propagation）
2. 每 `1000` 個 batch 就輸出一次當前的 loss 觀察是否有收斂的趨勢

In [22]:
from tqdm import tqdm
model = model.to(device)
model.train()
i = 0
total_loss = 0.0
num_batches = 0
for epoch in range(1, epochs+1):
    process_bar = tqdm(train_brackets_data_loader, desc=f"Training brackets epoch {epoch}")
    for batch_x, batch_y, batch_x_lens, batch_y_lens in process_bar:

        # 標準DL訓練幾板斧
        optimizer.zero_grad()
        batch_pred_y = model(batch_x.to(device), batch_x_lens)

        # 找batch_x的等號位置
        equ_pos = []
        for tensor in batch_x:
          for idx, value in enumerate(tensor):
            if value.item() == char_to_id['=']:
                equ_pos.append(idx)
        # 把從18個字元中，找出機率最大的就是預測的字元
        tensor_y = torch.zeros_like(batch_y)
        for i in range(0,batch_y.size(0)):
          idx = equ_pos[i]+1
          for j in range(0,batch_y.size(1)):
            if j+idx >= batch_pred_y.size(1):
              break
            tensor_y[i][j]=torch.argmax(batch_pred_y[i][idx+j],0)
        # 看tensor_pred_y和batch_y
        # for X,tensor,batch in zip(batch_x,tensor_y,batch_y):
        #   for idx in X:
        #     if id_to_char[idx.item()] == '=':
        #       print(id_to_char[idx.item()])
        #       break
        #     print(id_to_char[idx.item()],end=' ')
        #   print('pred_y:',end='')
        #   for val in tensor:
        #     if id_to_char[val.item()] == '<eos>':
        #       break
        #     print(id_to_char[val.item()],end=" ")
        #   print()
        #   print('batch_y:',end='')
        #   for val in batch:
        #     if id_to_char[val.item()] == '<eos>':
        #       break
        #     print(id_to_char[val.item()],end=" ")
        #   print()
        tensor_pred_y = torch.empty(batch_y.size(0),batch_y.size(1),18)
        # 把等號後面部分取出來
        for i in range(0,batch_y.size(0)):
          equ = equ_pos[i]+1
          for j in range(0,batch_y.size(1)):
            if j+equ >= batch_pred_y.size(1):
              break
            for k in range(0,18):
              # print(batch_pred_y[i][equ+j][k])
              tensor_pred_y[i][j][k] = batch_pred_y[i][equ+j][k]
        tensor_pred_y = tensor_pred_y.view(-1,vocab_size)
        batch_y = batch_y.view(-1).to(device)

        loss = criterion(tensor_pred_y,batch_y)
        loss.backward()
        torch.nn.utils.clip_grad_value_(model.parameters(), grad_clip)
        optimizer.step()
        total_loss += loss.item()
        num_batches += 1
        i+=1
        if i%10==0:
          process_bar.set_postfix(loss=loss.item())
#  嘗試在一個epoch中用不同資料訓練
#     process_bar = tqdm(train_no_brackets_data_loader, desc=f"Training no_brackets epoch {epoch}")
#     for batch_x, batch_y, batch_x_lens, batch_y_lens in process_bar:

#         # 標準DL訓練幾板斧
#         optimizer.zero_grad()
#         batch_pred_y = model(batch_x.to(device), batch_x_lens)

#         # 找batch_x的等號位置
#         equ_pos = []
#         for tensor in batch_x:
#           for idx, value in enumerate(tensor):
#             if value.item() == char_to_id['=']:
#                 equ_pos.append(idx)
#         # 把從18個字元中，找出機率最大的就是預測的字元
#         tensor_y = torch.zeros_like(batch_y)
#         for i in range(0,batch_y.size(0)):
#           idx = equ_pos[i]+1
#           for j in range(0,batch_y.size(1)):
#             if j+idx >= batch_pred_y.size(1):
#               break
#             tensor_y[i][j]=torch.argmax(batch_pred_y[i][idx+j],0)
#         tensor_pred_y = torch.empty(batch_y.size(0),batch_y.size(1),18)
#         # 把等號後面部分取出來
#         for i in range(0,batch_y.size(0)):
#           equ = equ_pos[i]+1
#           for j in range(0,batch_y.size(1)):
#             if j+equ >= batch_pred_y.size(1):
#               break
#             for k in range(0,18):
#               # print(batch_pred_y[i][equ+j][k])
#               tensor_pred_y[i][j][k] = batch_pred_y[i][equ+j][k]
#         tensor_pred_y = tensor_pred_y.view(-1,vocab_size)
#         batch_y = batch_y.view(-1).to(device)

#         loss = criterion(tensor_pred_y,batch_y)
#         loss.backward()
#         torch.nn.utils.clip_grad_value_(model.parameters(), grad_clip)
#         optimizer.step()
#         total_loss += loss.item()
#         num_batches += 1
#         i+=1
#         if i%10==0:
#           process_bar.set_postfix(loss=loss.item())
#     process_bar = tqdm(train_positive_data_loader, desc=f"Training positive epoch {epoch}")
#     for batch_x, batch_y, batch_x_lens, batch_y_lens in process_bar:

#         # 標準DL訓練幾板斧
#         optimizer.zero_grad()
#         batch_pred_y = model(batch_x.to(device), batch_x_lens)

#         # 找batch_x的等號位置
#         equ_pos = []
#         for tensor in batch_x:
#           for idx, value in enumerate(tensor):
#             if value.item() == char_to_id['=']:
#                 equ_pos.append(idx)
#         # 把從18個字元中，找出機率最大的就是預測的字元
#         tensor_y = torch.zeros_like(batch_y)
#         for i in range(0,batch_y.size(0)):
#           idx = equ_pos[i]+1
#           for j in range(0,batch_y.size(1)):
#             if j+idx >= batch_pred_y.size(1):
#               break
#             tensor_y[i][j]=torch.argmax(batch_pred_y[i][idx+j],0)
#         tensor_pred_y = torch.empty(batch_y.size(0),batch_y.size(1),18)
#         # 把等號後面部分取出來
#         for i in range(0,batch_y.size(0)):
#           equ = equ_pos[i]+1
#           for j in range(0,batch_y.size(1)):
#             if j+equ >= batch_pred_y.size(1):
#               break
#             for k in range(0,18):
#               # print(batch_pred_y[i][equ+j][k])
#               tensor_pred_y[i][j][k] = batch_pred_y[i][equ+j][k]
#         tensor_pred_y = tensor_pred_y.view(-1,vocab_size)
#         batch_y = batch_y.view(-1).to(device)

#         loss = criterion(tensor_pred_y,batch_y)
#         loss.backward()
#         torch.nn.utils.clip_grad_value_(model.parameters(), grad_clip)
#         optimizer.step()
#         total_loss += loss.item()
#         num_batches += 1
#         i+=1
#         if i%10==0:
#           process_bar.set_postfix(loss=loss.item())
#     process_bar = tqdm(train_negative_data_loader, desc=f"Training negative epoch {epoch}")
#     for batch_x, batch_y, batch_x_lens, batch_y_lens in process_bar:

#         # 標準DL訓練幾板斧
#         optimizer.zero_grad()
#         batch_pred_y = model(batch_x.to(device), batch_x_lens)

#         # 找batch_x的等號位置
#         equ_pos = []
#         for tensor in batch_x:
#           for idx, value in enumerate(tensor):
#             if value.item() == char_to_id['=']:
#                 equ_pos.append(idx)
#         # 把從18個字元中，找出機率最大的就是預測的字元
#         tensor_y = torch.zeros_like(batch_y)
#         for i in range(0,batch_y.size(0)):
#           idx = equ_pos[i]+1
#           for j in range(0,batch_y.size(1)):
#             if j+idx >= batch_pred_y.size(1):
#               break
#             tensor_y[i][j]=torch.argmax(batch_pred_y[i][idx+j],0)
#         tensor_pred_y = torch.empty(batch_y.size(0),batch_y.size(1),18)
#         # 把等號後面部分取出來
#         for i in range(0,batch_y.size(0)):
#           equ = equ_pos[i]+1
#           for j in range(0,batch_y.size(1)):
#             if j+equ >= batch_pred_y.size(1):
#               break
#             for k in range(0,18):
#               # print(batch_pred_y[i][equ+j][k])
#               tensor_pred_y[i][j][k] = batch_pred_y[i][equ+j][k]
#         tensor_pred_y = tensor_pred_y.view(-1,vocab_size)
#         batch_y = batch_y.view(-1).to(device)

#         loss = criterion(tensor_pred_y,batch_y)
#         loss.backward()
#         torch.nn.utils.clip_grad_value_(model.parameters(), grad_clip)
#         optimizer.step()
#         total_loss += loss.item()
#         num_batches += 1
#         i+=1
#         if i%10==0:
#           process_bar.set_postfix(loss=loss.item())
#     process_bar = tqdm(train_total_data_loader, desc=f"Training total epoch {epoch}")
#     for batch_x, batch_y, batch_x_lens, batch_y_lens in process_bar:

#         # 標準DL訓練幾板斧
#         optimizer.zero_grad()
#         batch_pred_y = model(batch_x.to(device), batch_x_lens)

#         # 找batch_x的等號位置
#         equ_pos = []
#         for tensor in batch_x:
#           for idx, value in enumerate(tensor):
#             if value.item() == char_to_id['=']:
#                 equ_pos.append(idx)
#         # 把從18個字元中，找出機率最大的就是預測的字元
#         tensor_y = torch.zeros_like(batch_y)
#         for i in range(0,batch_y.size(0)):
#           idx = equ_pos[i]+1
#           for j in range(0,batch_y.size(1)):
#             if j+idx >= batch_pred_y.size(1):
#               break
#             tensor_y[i][j]=torch.argmax(batch_pred_y[i][idx+j],0)
#         tensor_pred_y = torch.empty(batch_y.size(0),batch_y.size(1),18)
#         # 把等號後面部分取出來
#         for i in range(0,batch_y.size(0)):
#           equ = equ_pos[i]+1
#           for j in range(0,batch_y.size(1)):
#             if j+equ >= batch_pred_y.size(1):
#               break
#             for k in range(0,18):
#               # print(batch_pred_y[i][equ+j][k])
#               tensor_pred_y[i][j][k] = batch_pred_y[i][equ+j][k]
#         tensor_pred_y = tensor_pred_y.view(-1,vocab_size)
#         batch_y = batch_y.view(-1).to(device)

#         loss = criterion(tensor_pred_y,batch_y)
#         loss.backward()
#         torch.nn.utils.clip_grad_value_(model.parameters(), grad_clip)
#         optimizer.step()
#         total_loss += loss.item()
#         num_batches += 1
#         i+=1
#         if i%10==0:
#           process_bar.set_postfix(loss=loss.item())
#     process_bar = tqdm(train_data_loader, desc=f"Training math epoch {epoch}")
#     for batch_x, batch_y, batch_x_lens, batch_y_lens in process_bar:

#         # 標準DL訓練幾板斧
#         optimizer.zero_grad()
#         batch_pred_y = model(batch_x.to(device), batch_x_lens)

#         # 找batch_x的等號位置
#         equ_pos = []
#         for tensor in batch_x:
#           for idx, value in enumerate(tensor):
#             if value.item() == char_to_id['=']:
#                 equ_pos.append(idx)
#         # 把從18個字元中，找出機率最大的就是預測的字元
#         tensor_y = torch.zeros_like(batch_y)
#         for i in range(0,batch_y.size(0)):
#           idx = equ_pos[i]+1
#           for j in range(0,batch_y.size(1)):
#             if j+idx >= batch_pred_y.size(1):
#               break
#             tensor_y[i][j]=torch.argmax(batch_pred_y[i][idx+j],0)
#         tensor_pred_y = torch.empty(batch_y.size(0),batch_y.size(1),18)
#         # 把等號後面部分取出來
#         for i in range(0,batch_y.size(0)):
#           equ = equ_pos[i]+1
#           for j in range(0,batch_y.size(1)):
#             if j+equ >= batch_pred_y.size(1):
#               break
#             for k in range(0,18):
#               # print(batch_pred_y[i][equ+j][k])
#               tensor_pred_y[i][j][k] = batch_pred_y[i][equ+j][k]
#         tensor_pred_y = tensor_pred_y.view(-1,vocab_size)
#         batch_y = batch_y.view(-1).to(device)

#         loss = criterion(tensor_pred_y,batch_y)
#         loss.backward()
#         torch.nn.utils.clip_grad_value_(model.parameters(), grad_clip)
#         optimizer.step()
#         total_loss += loss.item()
#         num_batches += 1
#         i+=1
#         if i%10==0:
#           process_bar.set_postfix(loss=loss.item())

#     average_loss = total_loss / num_batches
#     print(f"Training Loss: {average_loss}")
    # 麻煩各位同學加上 validation 的部分
    total_loss = 0.0
    num_batches = 0
    validation_process_bar = tqdm(validation_data_loader,desc="Validation")
    for batch_x, batch_y, batch_x_lens, batch_y_lens in validation_process_bar:
        # optimizer.zero_grad()
        batch_pred_y = model(batch_x.to(device), batch_x_lens)
        print()
        # 找等号位置
        equ_pos = []
        for tensor in batch_x:
            for idx, value in enumerate(tensor):
                if value.item() == char_to_id['=']:
                    equ_pos.append(idx)

        tensor_pred_y = torch.empty(batch_y.size(0), batch_y.size(1), 18)
        # 把等號後面部分取出來
        for i in range(0, batch_y.size(0)):
            equ = equ_pos[i]+1
            for j in range(0, batch_y.size(1)):
                if j+equ >= batch_pred_y.size(1):
                 break
                for k in range(0, 18):
                    tensor_pred_y[i][j][k] = batch_pred_y[i][equ + j][k]
        # 看tensor_pred_y和batch_y
        for X,tensor,batch in zip(batch_x,tensor_y,batch_y):
          for idx in X:
            if id_to_char[idx.item()] == '=':
              print(id_to_char[idx.item()])
              break
            print(id_to_char[idx.item()],end=' ')
          print('pred_y:',end='')
          for val in tensor:
            if id_to_char[val.item()] == '<eos>':
              break
            print(id_to_char[val.item()],end=" ")
          print()
          print('batch_y:',end='')
          for val in batch:
            if id_to_char[val.item()] == '<eos>':
              break
            print(id_to_char[val.item()],end=" ")
          print()
        tensor_pred_y = tensor_pred_y.view(-1, vocab_size)
        batch_y = batch_y.view(-1).to(device)

        loss = criterion(tensor_pred_y, batch_y)
        total_loss += loss.item()
        num_batches += 1
    average_loss = total_loss / num_batches
    print(f"Validation Loss: {average_loss}")


Training brackets epoch 1: 100%|██████████| 12/12 [08:10<00:00, 40.90s/it, loss=1.98]
Validation:   0%|          | 0/1 [00:00<?, ?it/s]


4 - 1 5 * 1 =
pred_y:1 1 
batch_y:- 1 1 
4 9 * 4 + 1 4 =
pred_y:- 
batch_y:2 1 0 
4 - 1 5 =
pred_y:1 1 
batch_y:- 1 1 
4 - ( 1 5 * 1 ) =
pred_y:1 
batch_y:- 1 1 
4 - 1 4 - 4 9 =
pred_y:
batch_y:- 5 9 
1 * ( 4 + 1 5 ) =
pred_y:
batch_y:1 9 
4 - 1 4 * 4 9 =
pred_y:
batch_y:- 6 8 2 
( 4 + 1 5 ) * 0 =
pred_y:- 2 2 
batch_y:0 
0 * 4 + 1 5 =
pred_y:2 2 1 
batch_y:1 5 
4 + 1 4 - 4 8 =
pred_y:2 
batch_y:- 3 0 
4 + ( 1 5 * 1 ) =
pred_y:- - 
batch_y:1 9 
4 + ( 1 5 - 0 ) =
pred_y:2 
batch_y:1 9 
( 4 - 1 4 ) + 4 8 =
pred_y:1 1 1 
batch_y:3 8 
4 - 1 5 + 0 =
pred_y:
batch_y:- 1 1 
4 9 * ( 4 + 1 4 ) =
pred_y:- 
batch_y:8 8 2 
( 1 5 * 0 ) + 4 =
pred_y:
batch_y:4 
4 + 1 4 * 4 9 =
pred_y:- 
batch_y:6 9 0 
( 4 - 1 5 ) * 1 =
pred_y:- - 
batch_y:- 1 1 
( 4 + 1 5 ) * 1 =
pred_y:- 1 1 
batch_y:1 9 
4 * 1 4 * 4 8 =
pred_y:- 1 1 1 
batch_y:2 6 8 8 
4 + 1 5 - 0 =
pred_y:- 2 2 
batch_y:1 9 
4 - 1 4 + 4 8 =
pred_y:- 
batch_y:3 8 
4 - 1 4 + 4 9 =
pred_y:3 
batch_y:3 9 
( 1 4 * 4 8 ) - 4 =
pred_y:
batch_y:6 6 8 
4

Validation: 100%|██████████| 1/1 [00:00<00:00,  1.41it/s]


Validation Loss: 1.7141104936599731


Training brackets epoch 2: 100%|██████████| 12/12 [08:14<00:00, 41.21s/it, loss=0.734]
Validation:   0%|          | 0/1 [00:00<?, ?it/s]


4 9 * 4 + 1 4 =
pred_y:1 6 8 
batch_y:2 1 0 
4 * 1 4 * 4 9 =
pred_y:- 2 2 
batch_y:2 7 4 4 
4 - ( 1 4 * 4 9 ) =
pred_y:2 6 3 3 
batch_y:- 6 8 2 
4 * 1 5 =
pred_y:- 4 3 
batch_y:6 0 
4 - 1 5 * 1 =
pred_y:0 
batch_y:- 1 1 
( 4 - 1 5 ) * 0 =
pred_y:1 5 1 9 
batch_y:0 
( 1 5 * 0 ) + 4 =
pred_y:2 8 
batch_y:4 
4 - 1 5 =
pred_y:3 3 3 
batch_y:- 1 1 
0 * 4 - 1 5 =
pred_y:- 4 2 
batch_y:- 1 5 
( 4 + 1 5 ) - 0 =
pred_y:1 2 1 1 
batch_y:1 9 
( 4 + 1 5 ) * 0 =
pred_y:3 1 3 6 
batch_y:0 
4 + 1 4 - 4 9 =
pred_y:- 1 3 2 
batch_y:- 3 1 
4 - 1 4 - 4 9 =
pred_y:7 2 
batch_y:- 5 9 
4 + 1 5 * 1 =
pred_y:5 2 
batch_y:1 9 
( 1 4 * 4 9 ) - 4 =
pred_y:3 8 2 7 
batch_y:6 8 2 
4 * 1 4 * 4 8 =
pred_y:4 4 5 3 
batch_y:2 6 8 8 
4 + 1 4 - 4 8 =
pred_y:5 2 2 
batch_y:- 3 0 
4 + 1 5 + 0 =
pred_y:5 6 
batch_y:1 9 
4 + ( 1 4 * 4 9 ) =
pred_y:- 1 3 1 
batch_y:6 9 0 
4 - 1 5 + 0 =
pred_y:3 3 
batch_y:- 1 1 
4 9 * ( 4 + 1 4 ) =
pred_y:6 3 
batch_y:8 8 2 
4 - ( 1 4 + 4 9 ) =
pred_y:5 4 
batch_y:- 5 9 
4 - ( 1 5 * 1 ) =
p

Validation: 100%|██████████| 1/1 [00:00<00:00,  1.46it/s]


5 ) =
pred_y:9 9 4 
batch_y:1 9 
4 * 1 5 * 0 =
pred_y:1 9 0 
batch_y:0 
( 4 + 1 4 ) - 4 9 =
pred_y:- 3 0 
batch_y:- 3 1 
4 - ( 1 5 * 0 ) =
pred_y:7 3 
batch_y:4 
( 1 4 * 4 8 ) - 4 =
pred_y:3 8 
batch_y:6 6 8 
4 - 1 4 + 4 8 =
pred_y:- 2 2 
batch_y:3 8 
Validation Loss: 0.5358591079711914


Training brackets epoch 3: 100%|██████████| 12/12 [08:21<00:00, 41.79s/it, loss=0.0614]
Validation:   0%|          | 0/1 [00:00<?, ?it/s]


4 + ( 1 5 * 0 ) =
pred_y:0 
batch_y:4 
( 4 - 1 5 ) * 1 =
pred_y:3 8 4 
batch_y:- 1 1 
4 - 1 5 * 1 =
pred_y:3 1 1 5 
batch_y:- 1 1 
0 * 4 + 1 5 =
pred_y:- 1 5 0 
batch_y:1 5 
4 - ( 1 4 + 4 8 ) =
pred_y:9 
batch_y:- 5 8 
1 * 4 + 1 5 =
pred_y:5 0 
batch_y:1 9 
4 + 1 5 * 0 =
pred_y:- 1 4 5 5 
batch_y:4 
4 9 * ( 4 + 1 4 ) =
pred_y:2 0 
batch_y:8 8 2 
4 + ( 1 5 * 1 ) =
pred_y:9 2 
batch_y:1 9 
0 * 4 - 1 5 =
pred_y:- 7 8 
batch_y:- 1 5 
4 9 * ( 4 - 1 4 ) =
pred_y:6 5 8 
batch_y:- 4 9 0 
( 4 - 1 4 ) * 4 9 =
pred_y:- 3 0 
batch_y:- 4 9 0 
( 4 - 1 4 ) + 4 8 =
pred_y:4 6 
batch_y:3 8 
4 + ( 1 4 - 4 9 ) =
pred_y:1 9 2 
batch_y:- 3 1 
4 + ( 1 5 - 0 ) =
pred_y:- 1 4 4 
batch_y:1 9 
( 4 + 1 4 ) - 4 9 =
pred_y:- 5 
batch_y:- 3 1 
4 + 1 4 + 4 8 =
pred_y:- 3 5 
batch_y:6 6 
4 + ( 1 4 - 4 8 ) =
pred_y:- 1 4 
batch_y:- 3 0 
( 1 4 * 4 9 ) - 4 =
pred_y:- 3 
batch_y:6 8 2 
4 * 1 5 =
pred_y:4 0 5 0 
batch_y:6 0 
4 9 * 4 + 1 4 =
pred_y:- 8 2 
batch_y:2 1 0 
4 + 1 5 + 0 =
pred_y:- 1 4 4 
batch_y:1 9 
( 1 5 * 0

Validation: 100%|██████████| 1/1 [00:00<00:00,  1.52it/s]

 4 + 1 4 ) - 4 8 =
pred_y:2 6 1 0 
batch_y:- 3 0 
4 * 1 5 * 0 =
pred_y:- 5 1 
batch_y:0 
( 1 5 * 0 ) + 4 =
pred_y:- 1 6 7 3 
batch_y:4 
4 + 1 4 - 4 8 =
pred_y:1 9 2 
batch_y:- 3 0 
4 - 1 4 * 4 9 =
pred_y:0 
batch_y:- 6 8 2 
( 4 + 1 4 ) * 4 9 =
pred_y:3 2 0 
batch_y:8 8 2 
4 - 1 4 + 4 9 =
pred_y:- 5 
batch_y:3 9 
4 * 1 4 * 4 8 =
pred_y:- 1 6 5 3 
batch_y:2 6 8 8 
4 9 * 4 - 1 4 =
pred_y:3 1 
batch_y:1 8 2 
( 4 + 1 5 ) - 0 =
pred_y:2 1 1 6 
batch_y:1 9 
4 + 1 4 * 4 9 =
pred_y:7 9 
batch_y:6 9 0 
( 4 - 1 4 ) + 4 9 =
pred_y:- 1 9 2 
batch_y:3 9 
4 - 1 5 =
pred_y:1 8 
batch_y:- 1 1 
4 + 1 5 * 1 =
pred_y:2 0 2 4 
batch_y:1 9 
1 * ( 4 + 1 5 ) =
pred_y:3 8 0 
batch_y:1 9 
4 - 1 4 - 4 8 =
pred_y:- 3 
batch_y:- 5 8 
4 - 1 5 * 0 =
pred_y:0 
batch_y:4 
4 + 1 4 - 4 9 =
pred_y:- 3 2 
batch_y:- 3 1 
( 4 + 1 5 ) * 1 =
pred_y:1 7 
batch_y:1 9 
( 1 4 * 4 8 ) - 4 =
pred_y:3 0 
batch_y:6 6 8 
4 * 1 4 * 4 9 =
pred_y:0 
batch_y:2 7 4 4 
1 * ( 4 - 1 5 ) =
pred_y:0 
batch_y:- 1 1 
4 - 1 4 + 4 8 =
pred_y:1 1 
b


