# 项目4-序列到序列

## 友情提示
同学们可以前往课程作业区先行动手尝试！！！

## 项目描述
- 英文翻译中文
  - 输入： 一句英文 （e.g.		tom is a student .） 
  - 输出： 中文翻译 （e.g. 		汤姆 是 个 学生 。）

## 数据集介绍
- Data (出自manythings 的 cmn-eng):
  - 训练资料：18000句
  - 检验资料：  500句
  - 测试资料： 2636句
- Format:
  - 不同语言的句子用 TAB ('\t') 分开
  - 字跟字之间用空白分开

## 项目要求
  - 实现seq2seq
  - Teachering Forcing 的功用: 尝试不用 Teachering Forcing 做训练
  - 实现 Attention Mechanism
  - 实现 Beam Search
  - 实现 Schedule Sampling

## 数据准备
已经下好

## 环境配置/安装

无

# 序列到序列介绍
- 大多数常见的 **sequence-to-sequence (seq2seq) model** 为 **encoder-decoder model**，主要由两个部分组成，分别是 **Encoder** 和 **Decoder**，而这两个部分则大多使用 **recurrent neural network (RNN)** 来实作，主要是用来解决输入和输出的长度不一样的情况
- **Encoder** 是将**一连串**的输入，如文字、影片、声音讯号等，编码为**单个向量**，这单个向量可以想像为是整个输入的抽象表示，包含了整个输入的资讯
- **Decoder** 是将 Encoder 输出的单个向量逐步解码，**一次输出一个结果**，直到将最后目标输出被产生出来为止，每次输出会影响下一次的输出，一般会在开头加入 "< BOS >" 来表示开始解码，会在结尾输出 "< EOS >" 来表示输出结束


![seq2seq](https://ai-studio-static-online.cdn.bcebos.com/2a6aa43ecef24003a564104362d3294bc25fa6cc7d314496aa80d405ed920d0c)


# 下载和引入需要的 libraries

In [None]:
%%capture

import paddle
import paddle.nn as nn
import paddle.optimizer as optim
from paddle.io import Dataset, DataLoader
paddle.disable_static()

import numpy as np
import sys
import os
import random
import json



# 资料结构

## 定义资料的转换
- 将不同长度的答案拓展到相同长度，以便训练模型

In [None]:
import numpy as np

class LabelTransform(object):
    def __init__(self, size, pad):
        self.size = size
        self.pad = pad

    def __call__(self, label):
        label = np.pad(label, (0, (self.size - label.shape[0])), mode='constant', constant_values=self.pad)
        return label


## 定义 Dataset
- Data (出自manythings 的 cmn-eng):
  - 训练资料：18000句
  - 检验资料：  500句
  - 测试资料： 2636句

- 资料预处理:
  - 英文：
    - 用 subword-nmt 套件将word转为subword
    - 建立字典：取出标签中出现频率高于定值的subword
  - 中文：
    - 用 jieba 将中文句子断词
    - 建立字典：取出标签中出现频率高于定值的词
  - 特殊字元： < PAD >, < BOS >, < EOS >, < UNK > 
    - < PAD >  ：无意义，将句子拓展到相同长度
    - < BOS >  ：Begin of sentence, 开始字元
    - < EOS >  ：End of sentence, 结尾字元
    - < UNK > ：单字没有出现在字典裡的字
  - 将字典里每个 subword (词) 用一个整数表示，分为英文和中文的字典，方便之后转为 one-hot vector   

- 处理后的档案:
  - 字典：
    - int2word_*.json: 将整数转为文字
    ![int2word_en.json](https://ai-studio-static-online.cdn.bcebos.com/ca259b973e0046bb88c50cd8e1e350b5af1ee2ed6833491c82ecce84f234467d)
    - word2int_*.json: 将文字转为整数
    ![word2int_en.json](https://ai-studio-static-online.cdn.bcebos.com/2f745b301c354acdab93c5425dbd16457f426dac676a4091b8e4252f2fb5fdd7)
    - $*$ 分为英文（en）和中文（cn）
  
  - 训练资料:
    - 不同语言的句子用 TAB ('\t') 分开
    - 字跟字之间用空白分开
    ![data](https://ai-studio-static-online.cdn.bcebos.com/cfe51aeb3f51463eb60fed6d78983a439664d4baf9b94e529a5418e516c03c57)
    


- 在将答案传出去前，在答案开头加入 "< BOS >" 符号，并于答案结尾加入 "< EOS >" 符号

In [None]:
import re
import json

class EN2CNDataset(Dataset):
    def __init__(self, root, max_output_len, set_name):
        self.root = root

        self.word2int_cn, self.int2word_cn = self.get_dictionary('cn')
        self.word2int_en, self.int2word_en = self.get_dictionary('en')

        # 载入资料
        self.data = []
        with open(os.path.join(self.root, f'{set_name}.txt'), "r") as f:
            for line in f:
                self.data.append(line)
        print (f'{set_name} dataset size: {len(self.data)}')

        self.cn_vocab_size = len(self.word2int_cn)
        self.en_vocab_size = len(self.word2int_en)
        self.transform = LabelTransform(max_output_len, self.word2int_en['<PAD>'])

    def get_dictionary(self, language):
        # 载入字典
        with open(os.path.join(self.root, f'word2int_{language}.json'), "r") as f:
            word2int = json.load(f)
        with open(os.path.join(self.root, f'int2word_{language}.json'), "r") as f:
            int2word = json.load(f)
        return word2int, int2word

    def __len__(self):
        return len(self.data)

    def __getitem__(self, Index):
        # 先将中英文分开
        sentences = self.data[Index]
        sentences = re.split('[\t\n]', sentences)
        sentences = list(filter(None, sentences))
        #print (sentences)
        assert len(sentences) == 2

        # 预备特殊字符
        BOS = self.word2int_en['<BOS>']
        EOS = self.word2int_en['<EOS>']
        UNK = self.word2int_en['<UNK>']

        # 在开头添加 <BOS>，在结尾添加 <EOS> ，不在字典的 subword (词) 用 <UNK> 取代
        en, cn = [BOS], [BOS]
        # 将句子拆解为 subword 并转为整数
        sentence = re.split(' ', sentences[0])
        sentence = list(filter(None, sentence))
        #print (f'en: {sentence}')
        for word in sentence:
            en.append(self.word2int_en.get(word, UNK))
        en.append(EOS)

        # 将句子拆解为单词并转为整数
        # e.g. < BOS >, we, are, friends, < EOS > --> 1, 28, 29, 205, 2
        sentence = re.split(' ', sentences[1])
        sentence = list(filter(None, sentence))
        #print (f'cn: {sentence}')
        for word in sentence:
            cn.append(self.word2int_cn.get(word, UNK))
        cn.append(EOS)

        en, cn = np.asarray(en), np.asarray(cn)

        # 用 <PAD> 将句子补到相同长度
        en, cn = self.transform(en), self.transform(cn)
        en, cn = paddle.to_tensor(en), paddle.to_tensor(cn)

        return en, cn


# 模型架构

## Encoder
- seq2seq模型的编码器为RNN。 对于每个输入，，**Encoder** 会输出**一个向量**和**一个隐藏状态(hidden state)**，并将隐藏状态用于下一个输入，换句话说，**Encoder** 会逐步读取输入序列，并输出单个矢量（最终隐藏状态）
- 参数:
  - en_vocab_size 是英文字典的大小，也就是英文的 subword 的个数
  - emb_dim 是 embedding 的维度，主要将 one-hot vector 的单词向量压缩到指定的维度，主要是为了降维和浓缩资讯的功用，可以使用预先训练好的 word embedding，如 Glove 和 word2vector
  - hid_dim 是 RNN 输出和隐藏状态的维度
  - n_layers 是 RNN 要叠多少层
  - dropout 是决定有多少的机率会将某个节点变为 0，主要是为了防止 overfitting ，一般来说是在训练时使用，测试时则不使用
- Encoder 的输入和输出:
  - 输入: 
    - 英文的整数序列 e.g. 1, 28, 29, 205, 2
  - 输出: 
    - outputs: 最上层 RNN 全部的输出，可以用 Attention 再进行处理
    - hidden: 每层最后的隐藏状态，将传递到 Decoder 进行解码


In [None]:
class Encoder(nn.Layer):
    def __init__(self, en_vocab_size, emb_dim, hid_dim, n_layers, dropout):
        super().__init__()
        self.embedding = nn.Embedding(en_vocab_size, emb_dim, sparse=True)
        self.hid_dim = hid_dim
        self.n_layers = n_layers
        self.rnn = nn.GRU(emb_dim, hid_dim, n_layers, dropout=dropout, direction="bidirectional")
        self.dropout = nn.Dropout(dropout)

    def forward(self, input):
        # input = [batch size, sequence len, vocab size]
        embedding = self.embedding(input)
        outputs, hidden = self.rnn(self.dropout(embedding))
        # outputs = [batch size, sequence len, hid dim * directions]
        # hidden =  [num_layers * directions, batch size  , hid dim]
        # outputs 是最上层RNN的輸出

        return outputs, hidden


## Decoder
- **Decoder** 是另一个 RNN，在最简单的 seq2seq decoder 中，仅使用 **Encoder** 每一层最后的隐藏状态来进行解码，而这最后的隐藏状态有时被称为 “content vector”，因为可以想像它对整个前文序列进行编码， 此 “content vector” 用作 **Decoder** 的**初始**隐藏状态， 而 **Encoder** 的输出通常用于 Attention Mechanism
- 参数
  - en_vocab_size 是英文字典的大小，也就是英文的 subword 的个数
  - emb_dim 是 embedding 的维度，是用来将 one-hot vector 的单词向量压缩到指定的维度，主要是为了降维和浓缩资讯的功用，可以使用预先训练好的 word embedding，如 Glove 和 word2vector
  - hid_dim 是 RNN 输出和隐藏状态的维度
  - output_dim 是最终输出的维度，一般来说是将 hid_dim 转到 one-hot vector 的单词向量
  - n_layers 是 RNN 要叠多少层
  - dropout 是决定有多少的机率会将某个节点变为0，主要是为了防止 overfitting ，一般来说是在训练时使用，测试时则不用
  - isatt 是来决定是否使用 Attention Mechanism

- Decoder 的输入和输出:
  - 输入:
    - 前一次解码出来的单词的整数表示
  - 输出:
    - hidden: 根据输入和前一次的隐藏状态，现在的隐藏状态更新的结果
    - output: 每个字有多少机率是这次解码的结果

In [None]:
class Decoder(nn.Layer):
    def __init__(self, cn_vocab_size, emb_dim, hid_dim, n_layers, dropout, isatt):
        super().__init__()
        self.cn_vocab_size = cn_vocab_size
        self.hid_dim = hid_dim * 2
        self.n_layers = n_layers
        self.embedding = nn.Embedding(cn_vocab_size, config.emb_dim)
        self.isatt = isatt
        self.attention = Attention(hid_dim)
        # 如果使用 Attention Mechanism 会使得输入维度变化，请在这里修改
        # e.g. Attention 接在输入后面会使得维度变化，所以输入维度改为
        # self.input_dim = emb_dim + hid_dim * 2 if isatt else emb_dim
        self.input_dim = emb_dim
        self.rnn = nn.GRU(self.input_dim, self.hid_dim, self.n_layers, dropout = dropout)
        self.embedding2vocab1 = nn.Linear(self.hid_dim, self.hid_dim * 2)
        self.embedding2vocab2 = nn.Linear(self.hid_dim * 2, self.hid_dim * 4)
        self.embedding2vocab3 = nn.Linear(self.hid_dim * 4, self.cn_vocab_size)
        self.dropout = nn.Dropout(dropout)

    def forward(self, input, hidden, encoder_outputs):
        # input = [batch size, vocab size]
        # hidden = [batch size, n layers * directions, hid dim]
        # Decoder 只会是单向，所以 directions=1
        input = input.unsqueeze(1)
        embedded = self.dropout(self.embedding(input))
        # embedded = [batch size, 1, emb dim]
        if self.isatt:
            attn = self.attention(encoder_outputs, hidden)
          # TODO: 在这里决定如何使用 Attention，e.g. 相加 或是 接在后面， 请注意维度变化
        output, hidden = self.rnn(embedded, hidden)
        # output = [batch size, 1, hid dim]
        # hidden = [num_layers, batch size, hid dim]

        # 将 RNN 的输出转为每个词出现的机率
        output = self.embedding2vocab1(output.squeeze(1))
        output = self.embedding2vocab2(output)
        prediction = self.embedding2vocab3(output)
        # prediction = [batch size, vocab size]
        return prediction, hidden



## Attention
- 当输入过长，或是单独靠 “content vector” 无法取得整个输入的意思时，用 Attention Mechanism 来提供 **Decoder** 更多的信息
- 主要是根据现在 **Decoder hidden state** ，去计算在 **Encoder outputs** 中，那些与其有较高的关系，根据关系的数值来决定该传给 **Decoder** 那些额外信息 
- 常见 Attention 的实作是用 Neural Network / Dot Product 来算 **Decoder hidden state** 和 **Encoder outputs** 之间的关系，再对所有算出来的数值做 **softmax** ，最后根据过完 **softmax** 的值对 **Encoder outputs** 做 **weight sum**

- TODO:
实现 Attention Mechanism

In [None]:
class Attention(nn.Layer):
    def __init__(self, hid_dim):
        super(Attention, self).__init__()
        self.hid_dim = hid_dim

    def forward(self, encoder_outputs, decoder_hidden):
        # encoder_outputs = [batch size, sequence len, hid dim * directions]
        # decoder_hidden = [num_layers, batch size, hid dim]
        # 一般来说是取 Encoder 最后一层的 hidden state 来做 attention
        ########
        # TODO #
        ########
        attention_energies = paddle.sum(decoder_hidden*encoder_outputs, dim=2)
        attn_weights = nn.functional.softmax(attention_energies, dim=0)
        print(attn_weights.size())
        # attention=None
        return None


## Seq2Seq
- 由 **Encoder** 和 **Decoder** 组成
- 接收输入并传给 **Encoder** 
- 将 **Encoder** 的输出传给 **Decoder**
- 不断地将 **Decoder** 的输出传回 **Decoder** ，进行解码  
- 当解码完成后，将 **Decoder** 的输出传回 

In [None]:
class Seq2Seq(nn.Layer):
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder
        assert encoder.n_layers == decoder.n_layers, \
                "Encoder and decoder must have equal number of layers!"

    def forward(self, input, target, teacher_forcing_ratio):
        # input  = [batch size, input len, vocab size]
        # target = [batch size, target len, vocab size]
        # teacher_forcing_ratio 是有多少机率使用正确答案来训练
        batch_size = target.shape[0]
        target_len = target.shape[1]
        vocab_size = self.decoder.cn_vocab_size

        # 准备一个储存空间来储存输出
        # outputs = paddle.zeros((batch_size, target_len, vocab_size))
        # outputs = np.zeros((batch_size, target_len, vocab_size))
        outputs = [paddle.zeros((batch_size, vocab_size))]

        # 将输入放入 Encoder
        encoder_outputs, hidden = self.encoder(input)
        # Encoder 最后的隐藏层(hidden state) 用来初始化 Decoder
        # encoder_outputs 主要是使用在 Attention
        # encoder_outputs = [batch size, sequence len, hid dim * directions]
        # 因为 Encoder 是双向的RNN，所以需要将同一层两个方向的 hidden state 接在一起
        # hidden =  [num_layers * directions, batch size  , hid dim]  --> [num_layers, directions, batch size  , hid dim]
        hidden = paddle.reshape(hidden, shape=[self.encoder.n_layers, 2, batch_size, -1])

        hidden = paddle.concat((hidden[:, -2, :, :], hidden[:, -1, :, :]), axis=2)
        # 取的 <BOS> token
        input = target[:, 0]
        preds = []
        for t in range(1, target_len):
            # input [16] hidden [3, 16, 1024] encoder_outputs [16, 50, 1024]
            output, hidden = self.decoder(input, hidden, encoder_outputs)
            # output.size [16, 3805]  hidden size [3, 16, 1024]
            # outputs[:, t, :] = output.numpy()
            outputs.append(output)
            # 决定是否用正确答案来做训练
            teacher_force = random.random() <= teacher_forcing_ratio
            # 取出机率最大的单词
            top1 = output.argmax(1)
            # 如果是 teacher force 则用正解训练，反之用自己预测的单词做预测
            input = target[:, t] if teacher_force and t < target_len else top1
            preds.append(top1.unsqueeze(1))
        preds = paddle.concat(preds, 1)    
        return outputs, preds

    def inference(self, input, target):
        ########
        # TODO #
        ########
        # 在这里实施 Beam Search
        # 此函式的 batch size = 1  
        # input  = [batch size, input len, vocab size]
        # target = [batch size, target len, vocab size]
        batch_size = input.shape[0]
        input_len = input.shape[1]        # 取得最大字数
        vocab_size = self.decoder.cn_vocab_size

        # 准备一个储存空间来储存输出
        outputs = [paddle.zeros((batch_size, vocab_size))]
        # 将输入放入 Encoder
        encoder_outputs, hidden = self.encoder(input)
        # Encoder 最后的隐藏层(hidden state) 用来初始化 Decoder
        # encoder_outputs 主要是使用在 Attention
        # 因为 Encoder 是双向的RNN，所以需要将同一层两个方向的 hidden state 接在一起
        # hidden =  [num_layers * directions, batch size  , hid dim]  --> [num_layers, directions, batch size  , hid dim]
        # hidden = hidden.reshape(self.encoder.n_layers, 2, batch_size, -1)
        hidden = paddle.reshape(hidden, [self.encoder.n_layers, 2, batch_size, -1])
        hidden = paddle.concat((hidden[:, -2, :, :], hidden[:, -1, :, :]), axis=2)
        # 取的 <BOS> token
        input = target[:, 0]
        preds = []
        for t in range(1, input_len):
            output, hidden = self.decoder(input, hidden, encoder_outputs)
            # 将预测结果存起来
            outputs.append(output)
            # 取出机率最大的单词
            top1 = output.argmax(1)
            input = top1
            preds.append(top1.unsqueeze(1))
        preds = paddle.concat(preds, 1)
        return outputs, preds


In [None]:
paddle.disable_static()

# utils
- 基本操作:
  - 储存模型
  - 载入模型
  - 建构模型
  - 将一连串的数字还原回句子
  - 计算 BLEU score
  - 迭代 dataloader
  

## 储存模型

In [None]:
def save_model(model, optimizer, store_model_path, step):
    paddle.save(model.state_dict(), f'{store_model_path}/model_{step}.pdparams')
    return

## 载入模型

In [None]:
def load_model(model, load_model_path):
    print(f'Load model from {load_model_path}')
    state_dict = paddle.load(f'{load_model_path}.pdparams')
    model.set_state_dict(state_dict)

    return model

## 建构模型

In [None]:
def build_model(config, en_vocab_size, cn_vocab_size):
    # 建构模型
    encoder = Encoder(en_vocab_size, config.emb_dim, config.hid_dim, config.n_layers, config.dropout)
    decoder = Decoder(cn_vocab_size, config.emb_dim, config.hid_dim, config.n_layers, config.dropout, config.attention)
    model = Seq2Seq(encoder, decoder)
    # 建构 optimizer
    optimizer = optim.Adam(learning_rate=config.learning_rate, parameters=model.parameters())
    # optimizer = optim.Adam(learning_rate=config.learning_rate, parameters=model.parameters())
    if config.load_model:
        model = load_model(model, config.load_model_path)

    return model, optimizer


## 数字转句子

In [None]:
def tokens2sentence(outputs, int2word):
    sentences = []
    for tokens in outputs:
        sentence = []
        for token in tokens:
            word = int2word[str(int(token))]
            if word == '<EOS>':
                break
            sentence.append(word)
        sentences.append(sentence)
    return sentences


## 计算 BLEU score

In [None]:
import nltk
from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.bleu_score import SmoothingFunction

def computebleu(sentences, targets):
    score = 0 
    assert (len(sentences) == len(targets))

    def cut_token(sentence):
        tmp = []
        for token in sentence:
            if token == '<UNK>' or token.isdigit() or len(bytes(token[0], encoding='utf-8')) == 1:
                tmp.append(token)
            else:
                tmp += [word for word in token]
        return tmp 

    for sentence, target in zip(sentences, targets):
        sentence = cut_token(sentence)
        target = cut_token(target)
        score += sentence_bleu([target], sentence, weights=(1, 0, 0, 0))                                                                                          

    return score


  regargs, varargs, varkwargs, defaults, formatvalue=lambda value: ""
  from collections import Sequence, defaultdict


## 迭代 dataloader

In [None]:
def infinite_iter(data_loader):
    it = iter(data_loader)
    while True:
        try:
            ret = next(it)
            yield ret
        except StopIteration:
            it = iter(data_loader)

## schedule_sampling

In [None]:
########
# TODO #
########

# 请在这里直接 return 0 来取消 Teacher Forcing
# 请在这里实现 schedule_sampling 的策略

def schedule_sampling():
    return 1

# 训练步骤

## 训练
- 训练阶段

In [None]:
def train(model, optimizer, train_iter, loss_function, total_steps, summary_steps, train_dataset):
    model.train()
    losses = []
    for step in range(summary_steps):
        loss = paddle.zeros([1])
        # print("loss at first", loss)

        sources, targets = next(train_iter)
        outputs, preds = model(sources, targets, schedule_sampling())
        # targets 的第一个 token 是 <BOS> 所以忽略

        for i, output in enumerate(outputs):
            if i>0:
                step_loss = loss_function(output, targets[:, i].unsqueeze(1))
                avg_step_loss = paddle.mean(step_loss)
                loss += avg_step_loss

        loss = loss / (targets.shape[1]-1)
        loss.backward()

        optimizer.step()
        optimizer.clear_grad()

        if (step + 1) % 5 == 0:
            print ("\r", "train [{}] loss: {:.3f}, Perplexity: {:.3f}      ".format(total_steps + step + 1, loss.numpy()[0], np.exp(loss.numpy()[0])), end=" ")
            losses.append(loss.numpy()[0])

    return model, optimizer, losses


## 检验/测试
- 防止训练发生overfitting

In [None]:
def test(model, dataloader, loss_function):
    model.eval()
#     loss_sum, bleu_score= 0.0, 0.0
    bleu_score= 0.0
    n = 0
    result = []
    for sources, targets in dataloader:
        batch_size = sources.shape[0]

        outputs, preds = model.inference(sources, targets)
        # targets 的第一个 token 是 <BOS> 所以忽略
        loss = paddle.zeros([1])
        for i, output in enumerate(outputs):
            if i>0:
                step_loss = loss_function(output, targets[:, i].unsqueeze(1))
                avg_step_loss = paddle.mean(step_loss)
                loss += avg_step_loss
        
        loss = loss / (targets.shape[1]-1)

        # 将预测结果转为文字
        preds = tokens2sentence(preds, dataloader.dataset.int2word_cn)

        sources = tokens2sentence(sources, dataloader.dataset.int2word_en)
        targets = tokens2sentence(targets, dataloader.dataset.int2word_cn)

        for source, pred, target in zip(sources, preds, targets):
            result.append((source, pred, target))
        # 计算 Bleu Score
        bleu_score += computebleu(preds, targets)

        n += batch_size

    return loss.numpy()[0], bleu_score / n, result


## 训练流程
- 先训练，再检验

In [None]:
def train_process(config):
    # 准备训练资料
    train_dataset = EN2CNDataset(config.data_path, config.max_output_len, 'training')
    train_loader = DataLoader(train_dataset, batch_size=config.batch_size, shuffle=True, places=paddle.CPUPlace())
    train_iter = infinite_iter(train_loader)
    # 准备检验资料
    val_dataset = EN2CNDataset(config.data_path, config.max_output_len, 'validation')
    val_loader = DataLoader(val_dataset, batch_size=1, places=paddle.CPUPlace())
    # 建构模型
    print("train_dataset.en_vocab_size",train_dataset.en_vocab_size)
    model, optimizer = build_model(config, train_dataset.en_vocab_size, train_dataset.cn_vocab_size)
    loss_function = nn.loss.CrossEntropyLoss(ignore_index=0)
        
    
    train_losses, val_losses, bleu_scores = [], [], []
    total_steps = 0
    while (total_steps < config.num_steps):
        # 训练模型
        model, optimizer, loss = train(model, optimizer, train_iter, loss_function, total_steps, config.summary_steps, train_dataset)
        train_losses += loss
        # 检验模型
        val_loss, bleu_score, result = test(model, val_loader, loss_function)
        val_losses.append(val_loss)
        bleu_scores.append(bleu_score)

        total_steps += config.summary_steps
        print ("\r", "val [{}] loss: {:.3f}, Perplexity: {:.3f}, blue score: {:.3f}       ".format(total_steps, val_loss, np.exp(val_loss), bleu_score))

        # 储存模型和结果
        if total_steps % config.store_steps == 0 or total_steps >= config.num_steps:
            save_model(model, optimizer, config.store_model_path, total_steps)
            with open(f'{config.store_model_path}/output_{total_steps}.txt', 'w') as f:
                for line in result:
                    print (line, file=f)
    
    return train_losses, val_losses, bleu_scores


## 测试流程

In [None]:
def test_process(config):
    # 准备测试资料
    test_dataset = EN2CNDataset(config.data_path, config.max_output_len, 'testing')
    test_loader = DataLoader(test_dataset, batch_size=64, places=paddle.CPUPlace())
    # 建构模型
    model, optimizer = build_model(config, test_dataset.en_vocab_size, test_dataset.cn_vocab_size)
    print ("Finish build model")
    loss_function = nn.loss.CrossEntropyLoss(ignore_index=0)
    model.eval()
    # 测试模型
    test_loss, bleu_score, result = test(model, test_loader, loss_function)
    # 储存结果
    with open(f'{config.store_model_path}/test_output.txt', 'w') as f:
        for line in result:
            print (line, file=f)
    return test_loss, bleu_score


# Config
- 实验的参数设定表

In [None]:
class configurations(object):
    def __init__(self):
        self.batch_size = 128
        self.emb_dim = 256
        self.hid_dim = 512
        self.n_layers = 3
        self.dropout = 0.5
        self.learning_rate = 0.0005
        self.max_output_len = 50              # 最后输出句子的最大长度
        self.num_steps = 12000                # 总训练次数
        self.store_steps = 300                # 训练多少次后须储存模型
        self.summary_steps = 300              # 训练多少次后须检验是否有overfitting
        self.load_model = False               # 是否需载入模型
        self.store_model_path = "work/ckpt"      # 储存模型的位置
        self.load_model_path = "work/ckpt/model_900"           # 载入模型的位置 e.g. "./ckpt/model_{step}" 
        self.data_path = "work/cmn-eng"          # 资料存放的位置
        self.attention = False                # 是否使用 Attention Mechanism


# Main Function
- 读入参数
- 进行训练或是推论

## train

In [21]:
if __name__ == '__main__':
    config = configurations()
    print ('config:\n', vars(config))
    train_losses, val_losses, bleu_scores = train_process(config)


config:
 {'batch_size': 128, 'emb_dim': 256, 'hid_dim': 512, 'n_layers': 3, 'dropout': 0.5, 'learning_rate': 0.0005, 'max_output_len': 50, 'num_steps': 12000, 'store_steps': 300, 'summary_steps': 300, 'load_model': False, 'store_model_path': 'work/ckpt', 'load_model_path': 'work/ckpt/model_900', 'data_path': 'work/cmn-eng', 'attention': False}
training dataset size: 18000
validation dataset size: 500
train_dataset.en_vocab_size 3922
 train [300] loss: 1.117, Perplexity: 3.056       

The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


 val [300] loss: 1.696, Perplexity: 5.454, blue score: 0.202       
 val [600] loss: 1.519, Perplexity: 4.570, blue score: 0.255       
 val [900] loss: 1.316, Perplexity: 3.730, blue score: 0.256       
 val [1200] loss: 1.393, Perplexity: 4.028, blue score: 0.270       
 val [1500] loss: 1.247, Perplexity: 3.480, blue score: 0.300       
 val [1800] loss: 1.239, Perplexity: 3.453, blue score: 0.332       
 val [2100] loss: 1.344, Perplexity: 3.835, blue score: 0.352       
 train [2280] loss: 0.422, Perplexity: 1.524       

KeyboardInterrupt: 

## test

In [23]:
# 在执行 Test 之前，请先行至 config 设定所要载入的模型位置
if __name__ == '__main__':
    config = configurations()
    print ('config:\n', vars(config))
    test_loss, bleu_score = test_process(config)
    print (f'test loss: {test_loss}, bleu_score: {bleu_score}')

config:
 {'batch_size': 128, 'emb_dim': 256, 'hid_dim': 512, 'n_layers': 3, 'dropout': 0.5, 'learning_rate': 0.0005, 'max_output_len': 50, 'num_steps': 12000, 'store_steps': 300, 'summary_steps': 300, 'load_model': False, 'store_model_path': 'work/ckpt', 'load_model_path': 'work/ckpt/model_900', 'data_path': 'work/cmn-eng', 'attention': False}
testing dataset size: 2636
Finish build model


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


test loss: 1.850272297859192, bleu_score: 0.0024481339031657383


# 图形化训练过程

## 以图表呈现 训练 的 loss 变化趋势

In [24]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
plt.plot(train_losses)
plt.xlabel('次数')
plt.ylabel('loss')
plt.title('train loss')
plt.show()

NameError: name 'train_losses' is not defined

<Figure size 432x288 with 0 Axes>

## 以图表呈现 检验 的 loss 变化趋势

In [None]:
import matplotlib.pyplot as plt
plt.figure()
plt.plot(val_losses)
plt.xlabel('次数')
plt.ylabel('loss')
plt.title('validation loss')
plt.show()

## BLEU score

In [None]:
import matplotlib.pyplot as plt
plt.figure()
plt.plot(bleu_scores)
plt.xlabel('次数')
plt.ylabel('BLEU score')
plt.title('BLEU score')
plt.show()