# 自回归语言生成
我们先回顾一下自回归 (auto-regressive) 语言生成的过程。自回归语言生成假设每个词语序列的概率都可以分解为一系列条件词语概率的乘积：

$$P\left(w_{1: T} \mid W_{0}\right)=\prod_{t=1}^{T} P\left(w_{t} \mid w_{1: t-1}, W_{0}\right), \quad w_{1: 0}=\varnothing$$

基于上下文 $W_0$ 以及已经生成的词 $\omega_{1:t-1}$ 进行下一个词的预测 $\omega_t$，这就就是*自回归 (auto-regressive)*
- 生成的序列长度通常不是预先确定的，而是通过生成出休止符结（EOS token）束迭代
- 下面我们将介绍目前常用的四种解码方式：

    - 贪心搜索 (Greedy Search)
    - 柱搜索 (Beam search)
    - Top-K 采样 (Top-K sampling)
    - Top-p 采样 (Top-p sampling)。

In [1]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

# 加载 GPT-2 模型与分词器
model_name = "gpt2"   
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

In [2]:
# 输入提示
prompt = "The meaning of life is"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)


## 1. 贪心搜索 (Greedy Search)

**原理**  
在每一步预测时，贪心搜索都会从模型给出的概率分布中选择概率最大的那个词作为当前的输出。  

$$x_t = \arg\max_{w \in V} P(w | x_{<t})$$

**优点**  
- 简单直观，计算开销小。  
- 生成结果通常比较连贯，尤其在语言模型较强的情况下。  

**缺点**  
- 容易陷入局部最优，缺乏多样性。  
- 在开放式文本生成中，容易出现模式化、重复的句子。  

In [3]:
# 1. 贪心搜索 (Greedy Search)
greedy_output = model.generate(
    input_ids,
    max_length=30,
    do_sample=False,  # 不采样，直接取最大概率
    pad_token_id=tokenizer.eos_token_id
)
print("Greedy Search:\n", tokenizer.decode(greedy_output[0], skip_special_tokens=True))


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Greedy Search:
 The meaning of life is not the same as the meaning of death.

The meaning of life is not the same as the meaning of death.


## 2. 束搜索 (Beam Search)

**原理**  
束搜索在每一步保留 \(k\) 个概率最高的候选序列，而不是只选择一个。  
每个候选在后续步骤继续扩展，最终从所有候选中选择整体概率最高的序列。  

目标函数：  

$$\hat{x} = \arg\max_{x \in V^T} P(x | \text{input})$$

**优点**  
- 比贪心搜索更接近全局最优。  
- 在机器翻译、摘要等需要精确性的任务上表现良好。  

**缺点**  
- 束宽太小 → 结果和贪心差不多。  
- 束宽太大 → 可能生成过长或缺乏多样性的文本。  
- 在开放式生成任务（如写故事）中，常常显得“刻板”。  

In [4]:
# 2. 束搜索 (Beam Search)
beam_output = model.generate(
    input_ids,
    max_length=30,
    num_beams=5,      # beam size = 5
    early_stopping=True,
    pad_token_id=tokenizer.eos_token_id
)
print("\nBeam Search:\n", tokenizer.decode(beam_output[0], skip_special_tokens=True))



Beam Search:
 The meaning of life is not the same as the meaning of death. The meaning of life is not the same as the meaning of death.




## 3. Top-K 采样 (Top-K Sampling)

**原理**  
在每一步生成时，不是直接选取概率最大的词，而是从概率前 \(K\) 个词中进行随机采样。  
换句话说，只考虑前 \(K\) 个候选，其他词的概率置为零，然后进行归一化。  

**优点**  
- 引入了随机性，输出更加多样化。  
- 控制生成的质量与多样性平衡。  

**缺点**  
- 如果 \(K\) 选得太小，结果接近贪心搜索。  
- 如果 \(K\) 太大，可能引入低质量、不相关的词。  

In [5]:
# 3. Top-K 采样
topk_output = model.generate(
    input_ids,
    max_length=30,
    do_sample=True,
    top_k=50,        # 只保留概率前50个词进行采样
    temperature=1.0,  # 温度系数，越大越随机
    pad_token_id=tokenizer.eos_token_id
)
print("\nTop-K Sampling:\n", tokenizer.decode(topk_output[0], skip_special_tokens=True))


Top-K Sampling:
 The meaning of life is not a mere set of numbers, but the structure of the human heart that generates it.

For more than a century


## 4. Top-p 采样 (Nucleus Sampling)

**原理**  
与 Top-K 不同，Top-p 采样不是固定数量 \(K\)，而是动态选择累计概率超过阈值 \(p\) 的最小词集合，然后从中采样。  

找到最小集合 $V_p$ ，使得：  

$$\sum_{w \in V_p} P(w | x_{<t}) \geq p$$


**优点**  
- 自适应选择候选数量：高置信度时集合小，低置信度时集合大。  
- 通常比 Top-K 更自然，因为它考虑了概率分布的形状。  

**缺点**  
- 参数 \(p\) 需要调优（常见 0.8 ~ 0.95）。  
- 如果 \(p\) 过大，仍可能生成低质量文本。  

In [6]:
# 4. Top-p 采样 (Nucleus Sampling)
topp_output = model.generate(
    input_ids,
    max_length=30,
    do_sample=True,
    top_p=0.9,       # nucleus 采样，累计概率达到 0.9 的集合中采样
    temperature=1.0,
    pad_token_id=tokenizer.eos_token_id
)
print("\nTop-p Sampling:\n", tokenizer.decode(topp_output[0], skip_special_tokens=True))


Top-p Sampling:
 The meaning of life is to make sure you are living for the good of the species.

What is a man's life?

When


In [7]:
# Beam Search + Top-p 组合策略
output = model.generate(
    input_ids,
    max_length=128,
    do_sample=True,   # 结合采样
    top_p=0.9,        # nucleus sampling
    top_k=50,
    pad_token_id=tokenizer.eos_token_id
)

print(tokenizer.decode(output[0], skip_special_tokens=True))


The meaning of life is to live with fear and for God we have found no other way than to take things for granted," he says.

The church has been in crisis for the last year, in part because of the number of gay men in its ranks, a sign that people are increasingly concerned about the lack of faith and the lack of acceptance they receive.

"They are becoming very fearful of the fact that their community will lose its confidence in the church and is in a position to lose its legitimacy in a church that is as homophobic as it is a civil society," says Sacks.

In the last two


```markdown
The meaning of life is the whole or part of life: it is a system of human activities and their means, not only for the creation and maintenance of life, but also for the social organization and management of life. Life is a living thing, whether it be that which it is capable of producing or is capable of producing. It has its means and its methods of production, of maintenance, and in a very different sense also for the production of the means of life. There are two kinds of life: the natural and the artificial. All natural life is a living thing that is dependent on other living things, which are different from the
```