In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.keras as keras
import os
import time
import io
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns


def initialization(seed=42):
    keras.backend.clear_session()
    np.random.seed(seed)
    tf.random.set_seed(seed)

In [2]:
from matplotlib import font_manager
my_font = font_manager.FontProperties(fname='../Fonts/SourceHanSerifSC-Medium.otf', size=14)

# 练习题 Exercise

## ex.1

Q: 使用`有状态RNN`与`无状态RNN`各有什么优缺点？

> - `Stateless RNNs`(at each training iteration **the model starts with a hidden state full of zeros**, then it updates this state at each time step, and after the last time step, it throws it away, as it is not needed anymore.) **can only capture patterns whose length is less than, or equal to, the size of the windows the RNN is trained on.** 
> - Conversely, `stateful RNNs` can **capture longer-term patterns.** However, implementing a stateful RNN is much **harder—especially preparing the dataset properly**. Moreover, stateful RNNs do not always work better, in part because consecutive batches are **not independent and identically distributed (IID)**. Gradient Descent is not fond of non-IID datasets.

## ex.2

Q: 人们为什么使用编码解码RNN而不是简单序列对序列RNN进行自动翻译？

> In general, if you translate a sentence one word at a time, the result will be terrible. For example, the French sentence “Je vous en prie” means “You are welcome,” but if you translate it one word at a time, you get “I you in pray.” Huh? It is much better to read the whole sentence first and then translate it. A plain `sequence-tosequence RNN` would **start translating a sentence immediately after reading the first word**, while an `Encoder–Decoder RNN` will **first read the whole sentence and then translate it.** That said, one could imagine a plain sequence-to-sequence RNN that would output silence whenever it is unsure about what to say next (just like human translators do when they must translate a live broadcast).

## ex.3

Q: 如何处理可变长度的输入序列？可变长度的输出序列呢？

> - `Variable-length input sequences` can be handled by **padding 填充 the shorter sequences so that all sequences in a batch have the same length**, and **using masking 掩码 to ensure the RNN ignores the padding token.** 
>
>    For better performance, you may also want to **create batches containing sequences of similar sizes**. `Ragged tensors` can hold sequences of variable lengths, and `tf.keras` will likely support them eventually, which will greatly simplify handling variable-length input sequences (at the time of this writing, it is not the case yet). `tensorflow`中使用`ragged.constant()`将非矩形列表转为`tensor`类型
>
> 
> - `Regarding variable-length output sequences`, 
> 
>  if **the length** of the output sequence **is known** in advance (e.g., if you know that it is the same as the input sequence), then you just need to configure the loss function so that it **ignores tokens** that come after the end of the sequence. Similarly, the code that will use the model should ignore tokens beyond the end of the sequence.
> 
>  But generally the length of the output sequence **is not known** ahead of time, so the solution is to train the model so that **it outputs an end of sequence token at the end of each sequence.**

> 不规则张量参考: https://www.tensorflow.org/guide/ragged_tensor?hl=zh-cn

## ex.4

Q: 什么是集束搜素，为什么要使用它？你可以使用哪种工具来实现它？

> - `Beam search` is a technique used to **improve the performance** of a trained Encoder–Decoder model, for example in a neural machine translation system.
> <img src="../images/other/16-10.png" width="300">
> 
>   The algorithm **keeps track追踪 of a short list of the $k$ most promising output sentences** (say, the top three), and at each decoder step it tries to extend them by one word; then it **keeps only the $k$ most likely sentences.** The parameter $k$ is called the `beam width`: the larger it is, the more CPU and RAM will be used, but also the more accurate the system will be. 
> - Instead of *greedily* choosing the most likely next word at each step to extend a single sentence, **this technique allows the system to explore several promising sentences simultaneously同时地.** Moreover, this technique lends itself well to **parallelization并行化**. 
> - You can implement beam search fairly easily using `TensorFlow Addons`.

## ex.5

Q: 什么是注意力机制？它有什么帮助？

> An `attention mechanism` is a technique initially used in `Encoder–Decoder models` to give the `decoder` more direct access to the input sequence, **allowing it to deal with longer input sequences.**
<img src="../images/other/16-11.png" width="400">
>
> **At each `decoder` time step, the current decoder’s state and the full output of the `encoder` are processed by an alignment对齐 model that outputs an alignment score for each input time step.** This score indicates which part of the input is **most relevant** to the current decoder time step. **The weighted sum of the encoder output** (weighted by their alignment score) **is then fed to the decoder**, which produces the next decoder state and the output for this time step. 
>
> > $\begin{array}{rlr}\alpha_{t s} & =\frac{\exp \left(\operatorname{score}\left(\boldsymbol{h}_{t}, \overline{\boldsymbol{h}}_{s}\right)\right)}{\sum_{s^{\prime}=1}^{S} \exp \left(\operatorname{score}\left(\boldsymbol{h}_{t}, \overline{\boldsymbol{h}}_{s^{\prime}}\right)\right)} & \text { [Attention weights] (1)} 
\\ \boldsymbol{c}_{t} & =\sum_{s} \alpha_{t s} \overline{\boldsymbol{h}}_{s} & \text { [Context vector] (2)} 
\\ \boldsymbol{a}_{t} & =f\left(\boldsymbol{c}_{t}, \boldsymbol{h}_{t}\right)=\tanh \left(\boldsymbol{W}_{c}\left[\boldsymbol{c}_{t} ; \boldsymbol{h}_{t}\right]\right) & \text { [Attention vector] (3)}\end{array}$
> 
> The main benefit of using an attention mechanism is the fact that the `Encoder–Decoder` model **can successfully process longer input sequences.** Another benefit is that the alignment scores makes the model **easier to debug调试 and interpret解释**: for example, if the model makes a mistake, you can look at which part of the input it was paying attention to, and this can help diagnose the issue. An attention mechanism is also at the core of the Transformer architecture, in the Multi-Head Attention layers. See the next answer.

## ex.6

Q: `Transformer`架构中最重要的层是什么？目的是什么？

> The most important layer in the Transformer architecture is the `Multi-Head Attention layer` (the original Transformer architecture contains 18 of them, including 6 Masked Multi-Head Attention layers). 
> <img src="../images/other/16-23.svg" width="400px">
> It is at the core of language models such as BERT and GPT-2. **Its purpose is to allow the model to identify which words are most aligned with each other, and then improve each word’s representation using these contextual clues上下文线索.**

## ex.7

Q: 你何时需要使用`采样softmax`？

> `Sampled softmax` is used when **training a classification model when there are many classes** (e.g., thousands). It computes an approximation 近似 of the crossentropy loss based on the **logit predicted by the model for the correct class**, and the predicted logits for a sample of incorrect words. This **speeds up training** considerably compared to computing the softmax over all logits and then estimating the cross-entropy loss. After training, the model can be used normally, using the regular softmax function to compute all the class probabilities based on all the logits.

## ex.8

Q: *Hochreiter*和*Schmidhuber*在有关`LSTM`的论文中使用了`嵌入式Reber`语法。它是人工语法，可产生诸如*BPBTSXXVPSEPEPE*之类的字符串。请查阅*Je Orr*对这个主题的精彩介绍。

选择一种特定的`嵌人式Reber`语法（例如，Jenny 主页上表示的语法)，然后训练`RNN`以识别字符串是否符合该语法。首先，你编写一个能够生成训练批量处理的函数，其中包含大约50%符合语法的字符, 50%不符合语法的字符串。

### Reber和嵌入式Reber 字符串生成

> 1.   First we need to **build a function that generates strings based on a Reber grammar**. The grammar will be represented as a list of possible transitions for each state. A transition specifies 指定 the **string to output** (or a grammar to generate it) and the **next state**.

<img src="../images/other/16-63.gif" width="400px">

如上图所示，它基本是个`有环的有向图`。我们从 B 开始，从一个节点移动到下一个节点，边走边添加我们传递给字符串的符号。当我们到达最后的 E 时结束。如果我们可以采取两条路径，例如在 T 之后，我们可以去 S 或 X，我们随机选择一个（概率相等）。

In [3]:
default_reber_grammar = [
    [("B", 1)],           # (state 0) =B=>(state 1)
    [("T", 2), ("P", 3)], # (state 1) =T=>(state 2) or =P=>(state 3)
    [("S", 2), ("X", 4)], # (state 2) =S=>(state 2) or =X=>(state 4)
    [("T", 3), ("V", 5)], # and so on...
    [("X", 3), ("S", 6)],
    [("P", 4), ("V", 6)],
    [("E", None)]        # (state 6) =E=>(terminal state)
]

通过这种方式，我们可以生成无限数量的字符串，这些字符串属于相当奇特的 Reber 语言。自己验证下面左边的字符串是可能的 Reber 字符串，而右边的不是。
$$
\begin{array}{|l||l|}
\hline {\text { "Reber" }} & \text { "Non-Reber" } \\
\hline \text { BTSSXXTVVE } & \text { BTSSPXSE } \\
\hline \text { BPVVE } & \text { BPTVVB } \\
\hline \text { BTXXVPSE } & \text { BTXXVVSE } \\
\hline \text { BPVPXVPXVPXVVE } & \text { BPVSPSE } \\
\hline \hline \text { BTSXXVPSE } & \text { BTSSSE } \\
\hline
\end{array}
$$

> 2. Let's generate a few strings based on the default Reber grammar

In [4]:
def generate_string(grammar):
    state = 0
    output = []
    while state is not None:
        index = np.random.randint(len(grammar[state]))   # len:1-> [("B", 1)]
        production, state = grammar[state][index]  # production="B", state=1
#         if isinstance(production, list):  # for embedded_reber_grammar
#             production = generate_string(grammar=production)
        output.append(production)
    return "".join(output)

In [5]:
initialization(seed=42)

for _ in range(25):
    print(generate_string(default_reber_grammar), end=" ")

BTXXTTVPXTVPXTTVPSE BPVPSE BTXSE BPVVE BPVVE BTSXSE BPTVPXTTTVVE BPVVE BTXSE BTXXVPSE BPTTTTTTTTVVE BTXSE BPVPSE BTXSE BPTVPSE BTXXTVPSE BPVVE BPVVE BPVVE BPTTVVE BPVVE BPVVE BTXXVVE BTXXVVE BTXXVPXVVE 

> 3. Looks good. Now let's generate a few strings based on the embedded Reber grammar

<img src="../images/other/16-64.gif" width="400px">

In [6]:
embedded_reber_grammar = [
    [("B", 1)],
    [("T", 2), ("P", 3)],
    [(default_reber_grammar, 4)],
    [(default_reber_grammar, 5)],
    [("T", 6)],
    [("P", 6)],
    [("E", None)]
]

把满足`Reber Grammar`的字符串`Embed`到一定的格式中得到的字符串满足`Embedded Reber Grammar`，该语法生成两种类型的字符串:
- 使用通过图的顶部路径生成：`BT<reber string>TE`
- 使用底部路径生成：`BP<reber string>PE`。

如果要判断字符串满足`Embedded Reber Grammar`，需要确定第二个字母和倒数第二个字母相同。对于一个学习模型，需要有某种记忆（第2个字母和倒数第2个字母相同）才能正确判断一个字符串是否满足`Embedded Reber Grammar`。如下格式则不满足:`BP<reber string>TE`

In [7]:
def generate_string(grammar):
    state = 0
    output = []
    while state is not None:
        index = np.random.randint(len(grammar[state]))   # len:1-> [("B", 1)]
        production, state = grammar[state][index]  # production="B", state=1
        if isinstance(production, list):  # for embedded_reber_grammar
            production = generate_string(grammar=production)
        output.append(production)
    return "".join(output)

In [8]:
initialization(42)

for _ in range(25):
    print(generate_string(embedded_reber_grammar), end=" ")

BTBPTTTVPXTVPXTTVPSETE BPBPTVPSEPE BPBPVVEPE BPBPVPXVVEPE BPBTXXTTTTVVEPE BPBPVPSEPE BPBTXXVPSEPE BPBTSSSSSSSXSEPE BTBPVVETE BPBTXXVVEPE BPBTXXVPSEPE BTBTXXVVETE BPBPVVEPE BPBPVVEPE BPBTSXSEPE BPBPVVEPE BPBPTVPSEPE BPBTXXVVEPE BTBPTVPXVVETE BTBPVVETE BTBTSSSSSSSXXVVETE BPBTSSSXXTTTTVPSEPE BTBPTTVVETE BPBTXXTVVEPE BTBTXSETE 

> 4. Okay, now we need a function to generate strings that **do not respect the grammar**. We could generate a random string, but the task would be a bit too easy, so instead we will generate a string that respects the grammar, and we will corrupt it by changing just one character:

生成不正确语法规则的字符串:

In [9]:
POSSIBLE_CHARS = "BEPSTVX"


def generate_corrupted_string(grammer, chars=POSSIBLE_CHARS):
    good_string = generate_string(grammer)
    index = np.random.randint(len(good_string))
    good_char = good_string[index]
    # set(POSSIBLE_CHARS) -> {'B', 'E', 'P', 'S', 'T', 'V', 'X'}
    # bad_char: 从good_char中随机去除一种字符后 再随机取出一个字符
    bad_char = np.random.choice(sorted(set(POSSIBLE_CHARS) - set(good_char)))
    out_char = good_string[:index] + bad_char + good_string[index + 1:] # 拼接字符串
    return out_char

In [10]:
initialization(42)

for _ in range(25):
    print(generate_corrupted_string(embedded_reber_grammar), end=" ")

BTBPTTTPPXTVPXTTVPSETE BPBTXEEPE BPBPTVVVEPE BPBTSSSSXSETE BPTTXSEPE BTBPVPXTTTTTTEVETE BPBTXXSVEPE BSBPTTVPSETE BPBXVVEPE BEBTXSETE BPBPVPSXPE BTBPVVVETE BPBTSXSETE BPBPTTTPTTTTTVPSEPE BTBTXXTTSTVPSETE BBBTXSETE BPBTPXSEPE BPBPVPXTTTTVPXTVPXVPXTTTVVEVE BTBXXXTVPSETE BEBTSSSSSXXVPXTVVETE BTBXTTVVETE BPBTXSTPE BTBTXXTTTVPSBTE BTBTXSETX BTBTSXSSTE 

### 编码

> 5. We cannot feed strings directly to an RNN, so we need to encode them somehow.

1. 一种选择是对每个字符进行`one-hot编码`。
2. 另一种选择是使用`embeddings嵌入`。


让我们选择第二个选项。为了使嵌入工作，我们需要将每个字符串转换为字符 ID 序列。让我们为此编写一个函数，使用可能字符串`BEPSTVX`中每个字符的索引：

In [11]:
POSSIBLE_CHARS = "BEPSTVX"

In [12]:
def string_to_ids(s, chars=POSSIBLE_CHARS):
    return [chars.index(c) for c in s]

In [13]:
string_to_ids("BTTTXXVVETE")

[0, 4, 4, 4, 6, 6, 5, 5, 1, 4, 1]

> 6. We can now generate the dataset, with 50% good strings, and 50% bad strings:

In [14]:
def generate_dataset(size):
    good_strings = [
        string_to_ids(generate_string(embedded_reber_grammar))
        for _ in range(size // 2)
    ]
    bad_strings = [
        string_to_ids(generate_corrupted_string(embedded_reber_grammar))
        for _ in range(size - size // 2)
    ]
    all_strings = good_strings + bad_strings
    X = tf.ragged.constant(all_strings, ragged_rank=1)
    y = np.array([[1.] for _ in range(len(good_strings))] +
                 [[0.] for _ in range(len(bad_strings))])
    return X, y

In [15]:
initialization(42)
 
X_train, y_train = generate_dataset( 10000 )
X_valid, y_valid = generate_dataset( 2000 )

2022-11-05 18:08:54.368724: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-05 18:08:54.441309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-05 18:08:54.441463: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-05 18:08:54.442728: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags

In [16]:
X_train[0], y_train[0]

(<tf.Tensor: shape=(22,), dtype=int32, numpy=
 array([0, 4, 0, 2, 4, 4, 4, 5, 2, 6, 4, 5, 2, 6, 4, 4, 5, 2, 3, 1, 4, 1],
       dtype=int32)>,
 array([1.]))

### 搭建模型

> 7. Perfect! We are ready to create the RNN to identify good strings. We build a simple sequence binary classifier:

In [17]:
initialization(seed=42)

In [18]:
embedding_size = 5

model = keras.models.Sequential([
    # `Ragged` : 布尔值，创建的占位符是否意味着不规则。
    keras.layers.InputLayer(input_shape=[None], dtype=tf.int32, ragged=True),
    keras.layers.Embedding(input_dim=len(POSSIBLE_CHARS),
                           output_dim=embedding_size),
    keras.layers.GRU(30),
    keras.layers.Dense(1, activation="sigmoid")
])

In [19]:
optimizer = keras.optimizers.SGD(learning_rate=0.02,
                                 momentum=0.95,
                                 nesterov=True)
model.compile(loss="binary_crossentropy",
              optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(X_train,
                    y_train,
                    epochs=20,
                    validation_data=(X_valid, y_valid))

Epoch 1/20


2022-11-05 18:08:58.720605: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8600


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


> 8. Now let's test our RNN on two tricky strings: the first one is bad while the second one is good. They only differ by the second to last character. If the RNN gets this right, it shows that it managed to notice the pattern that the second letter should always be equal to the second to last letter. That requires a fairly long short-term memory (which is the reason why we used a GRU cell).

In [20]:
test_strings = ["BPBTSSSSSSSXXTTVPXVPXTTTTTVVETE",   # bad_char
                "BPBTSSSSSSSXXTTVPXVPXTTTTTVVEPE"]   # good_char  仅在倒数第二个字符上有所不同
X_test = tf.ragged.constant([string_to_ids(s) for s in test_strings], ragged_rank=1)

In [21]:
y_proba = model.predict(X_test)
print()
print("这些是 Reber 字符串的估计概率：")
for index, string in enumerate(test_strings):
    print("{}: {:.2f}%".format(string, 100 * y_proba[index][0]))


这些是 Reber 字符串的估计概率：
BPBTSSSSSSSXXTTVPXVPXTTTTTVVETE: 0.01%
BPBTSSSSSSSXXTTVPXVPXTTTTTVVEPE: 99.97%


> Ta-da! It worked fine. The RNN found the correct answers with very high confidence. :)

## ex.9

> Q:训练可以将日期字符串从一种格式转换为另一种格式的编码器-解码器模型
>
> 例如:从`April 22, 2019` 转换为 `2019-04-22`.

详见 
- [`第16章 使用RNN和注意力机制进行自然语言处理(2) 1.2  编码器-解码器示例`](./第16章%20使用RNN和注意力机制进行自然语言处理(2).ipynb#编码器-解码器示例)
- [`第16章 使用RNN和注意力机制进行自然语言处理(2) 2.2  编码器-解码器示例续`](./第16章%20使用RNN和注意力机制进行自然语言处理(2).ipynb#编码器-解码器示例续)

## ex.10

> Q:阅读`TensorFlow`的带注意力机制的神经机器翻译教程.
>
> _Exercise: Go through TensorFlow's [Neural Machine Translation with Attention tutorial](https://homl.info/nmttuto)._

详见 
- [基于注意力的神经机器翻译](./nmt_with_attention.ipynb)

## ex.11

> Q: 使用最新的语言模型之一（例如BERT)来生成更具说服力的莎士比亚文本

使用最新语言模型的最简单方法是使用由 `Hugging Face` 开源的优秀`Transformer`库。它为自然语言处理提供了许多现代神经网络架构（包括 `BERT`、`GPT-2`、`RoBERTa`、`XLM`、`DistilBert`、`XLNet` 等），包括许多预训练模型。它依赖于 `TensorFlow` 或 `PyTorch`。最重要的是：使用起来非常简单。

首先，让我们加载一个预训练模型。在这个例子中，我们将使用 `Open AI` 的 `GPT` 模型，在顶部有一个额外的语言模型（只是一个权重与输入嵌入相关的线性层）。

1. 让我们导入它并加载预训练的权重

In [22]:
from transformers import TFOpenAIGPTLMHeadModel

In [23]:
model = TFOpenAIGPTLMHeadModel.from_pretrained("openai-gpt")

Downloading config.json:   0%|          | 0.00/656 [00:00<?, ?B/s]

Downloading tf_model.h5:   0%|          | 0.00/445M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFOpenAIGPTLMHeadModel.

All the layers of TFOpenAIGPTLMHeadModel were initialized from the model checkpoint at openai-gpt.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFOpenAIGPTLMHeadModel for predictions without further training.


2. 接下来,我们需要专门用于此模型的分词器, 如果安装了`spaCy`和`ftfy`库, 你将使用它们. 否则将会回退到`BERT`的`BasicTokenizer`分词器, 然后是`Byte-Pair Encoding`.

In [24]:
from transformers import OpenAIGPTTokenizer

tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")

Downloading vocab.json:   0%|          | 0.00/797k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/448k [00:00<?, ?B/s]

ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.


3. 使用分词器对提示文本进行分词和编码

In [25]:
prompt_text = "This royal throne of kings, this sceptred isle"

encoded_prompt = tokenizer.encode(text=prompt_text,
                                  add_special_tokens=False,
                                  return_tensors="tf")
encoded_prompt

<tf.Tensor: shape=(1, 10), dtype=int32, numpy=
array([[  616,  5751,  6404,   498,  9606,   240,   616, 26271,  7428,
        16187]], dtype=int32)>

In [26]:
tokenizer.convert_ids_to_tokens(encoded_prompt.numpy()[0])

['this</w>',
 'royal</w>',
 'throne</w>',
 'of</w>',
 'kings</w>',
 ',</w>',
 'this</w>',
 'scep',
 'tred</w>',
 'isle</w>']

4. 接下来，让我们使用模型在prompt提示后生成文本。我们将生成 5 个不同的句子，每个句子都以提示文本开头，然后是 40 个附加tokens。要了解所有超参数的作用，请务必查看 Patrick von Platen 的这篇精彩博文-[如何生成文本：使用不同的解码方法通过 Transformers 生成语言](https://huggingface.co/blog/how-to-generate)。您可以使用超参数来尝试获得更好的结果。

In [27]:
num_sequences = 5
length = 40

generated_sequences = model.generate(
    input_ids=encoded_prompt,
    do_sample=True,
    max_length=length + len(encoded_prompt[0]),    # 40+10=50
    temperature=1.0,
    top_k=0,
    top_p=0.9,
    repetition_penalty=1.0,
    num_return_sequences=num_sequences,
)

generated_sequences

<tf.Tensor: shape=(5, 50), dtype=int32, numpy=
array([[  616,  5751,  6404,   498,  9606,   240,   616, 26271,  7428,
        16187,   239,   525,   535,   599,   249,   636,  1370,   803,
        10589,   239,   246,  1913,   522,   867,  1589,   481,  1807,
          498, 17400,   488, 17400,   240,   488,   512,   635,   580,
         3717,   239,   244, 40477,   244,   568,   525,   535,   246,
         6253,   267,   244, 33312, 15735],
       [  616,  5751,  6404,   498,  9606,   240,   616, 26271,  7428,
        16187,  1056,   595,  6175,   485,   768,   239,   481,   618,
          498,  9606,   812,   595,   580,  6413,   485,   799,   485,
         3585,   260,   487,   636,  6959,   481,  7047,  1146,   267,
          244, 40477,  1000,     5, 18098,  1981,   481,  1392,   525,
          524, 18338,   240,  2954,   485],
       [  616,  5751,  6404,   498,  9606,   240,   616, 26271,  7428,
        16187,   240,   759,   595,  1796,   481,   638,   246,  1800,
        17685

5. 使用解码生成的序列并打印

In [28]:
for sequence in generated_sequences:
    text = tokenizer.decode(sequence, clean_up_tokenization_spaces=True)
    print(text)
    print("=" * 10)

this royal throne of kings, this sceptred isle. that's what i would call some perspective. a step or two outside the city of merchants and merchants, and you could be rich. " 
 " but that's a stretch! " anheg objected
this royal throne of kings, this sceptred isle does not belong to us. the king of kings will not be content to go to pieces - he would demand the crown himself! " 
 while eomer passed the word that his archers, calling to
this royal throne of kings, this sceptred isle, can not change the way a person regards it, in so doing they become pedant. and that is why i am needed on this campaign of regathering people to share the kingdom with the pe
this royal throne of kings, this sceptred isle will no longer be interred with that of all the ibis. now i would like for you to tell me of this logres, and then i will ask your questions, for i am an
this royal throne of kings, this sceptred isle is considered to be the greatest treasure ever carried, and the burial mounds of the dea

您可以尝试更新（和更大）的模型，例如 `GPT - 2`、`CTRL`、`Transformer-XL` 或 `XLNet`，它们都可以作为`Transformer`库中的预训练模型使用，包括顶部带有语言模型的变体。模型之间的预处理步骤略有不同，因此请务必查看转换器文档中的这个生成示例（此示例使用`PyTorch`，但只需进行很少的调整即可工作，例如在模型类名称的开头添加 `TF`，删除 `.to()` 方法调用，并使用 `return_tensors="tf"` 而不是`pt`。

6. 使用中文 `GPT2`模型实现中文文本生成.

In [29]:
from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-cluecorpussmall")

Downloading config.json:   0%|          | 0.00/577 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/401M [00:00<?, ?B/s]

In [30]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-cluecorpussmall")

Downloading vocab.txt:   0%|          | 0.00/107k [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/217 [00:00<?, ?B/s]

In [31]:
from transformers import TextGenerationPipeline

text_generator = TextGenerationPipeline(model, tokenizer)

In [32]:
text_generator("有一个聋哑的中国男孩", max_length=500, do_sample=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': '有一个聋哑的中国男孩 ， 他 是 聋 哑 人 ， 如 果 这 个 中 国 男 孩 要 离 开 中 国 后 去 外 国 发 展 ， 他 就 是 个 女 孩 ， 他 要 找 一 个 大 学 念 完 了 、 长 大 了 就 去 当 保 镖 的 外 国 女 孩 ， 要 找 一 个 小 的 中 国 男 孩 ， 要 找 一 个 有 文 化 但 是 特 别 有 能 力 的 外 国 女 孩 。 所 以 如 果 你 觉 得 听 过 这 样 一 句 话 后 就 能 体 会 到 这 个 男 孩 子 是 一 个 什 么 样 ， 他 做 什 么 事 情 会 对 中 国 的 教 育 和 对 人 生 有 更 多 的 影 响 ， 听 到 这 句 话 后 ， 我 觉 得 ， 这 个 男 孩 子 这 样 讲 话 就 是 给 我 们 提 供 了 一 个 学 习 和 思 考 机 会 。 中 国 人 的 思 维 是 很 灵 活 、 很 灵 活 、 很 有 创 造 力 的 ， 他 会 做 自 己 ， 会 做 别 人 的 事 情 ， 一 个 不 断 学 习 的 人 总 会 受 别 人 的 喜 欢 ， 所 以 在 我 们 这 个 世 界 ， 那 些 在 中 国 工 作 、 在 美 国 工 作 、 读 大 学 的 年 轻 人 ， 都 拥 有 一 个 创 造 创 新 的 奇 思 妙 想 ， 这 其 实 是 一 个 很 好 的 学 习 环 境 、 很 好 的 生 活 环 境 ， 大 家 都 知 道 自 己 能 看 的 更 多 ， 所 以 到 中 国 来 做 很 多 事 情 和 创 新 也 能 体 会 到 他 们 的 思 维 方 式 ， 对 中 国 的 教 育 和 对 人 生 的 影 响 。 你 可 以 在 这 种 环 境 里 面 ， 有 可 能 看 到 这 种 人 ， 有 可 能 看 到 这 种 老 的 中 国 的 孩 子 ， 或 者 有 可 能 看 到 很 多 中 国 的 孩 子 ， 或 者 在 一 边 看 不 见 的 中 国 孩 子 ， 这 些 老 者 。 我 觉 得 这 个 环 境 里 面 有 这 样 的 人 ， 他 们 把 自 己 的 思 维 方 式 、 思 维 方 式 都 有 了 很 多 方 法 ， 这 些 才 能 够 去 体 会 到 我 们 在 这 个 过 程 当 

> create:Apotosome 05/19/22

> update:Apotosome 10/26/22