开箱即用的 pipelines
Transformers 库将目前的 NLP 任务归纳为几下几类：

文本分类：例如情感分析、句子对关系判断等；
对文本中的词语进行分类：例如词性标注 (POS)、命名实体识别 (NER) 等；
文本生成：例如填充预设的模板 (prompt)、预测文本中被遮掩掉 (masked) 的词语；
从文本中抽取答案：例如根据给定的问题从一段文本中抽取出对应的答案；
根据输入文本生成新的句子：例如文本翻译、自动摘要等。
Transformers 库最基础的对象就是 pipeline() 函数，它封装了预训练模型和对应的前处理和后处理环节。我们只需输入文本，就能得到预期的答案。目前常用的 pipelines 有：

feature-extraction （获得文本的向量化表示）
fill-mask （填充被遮盖的词、片段）
ner（命名实体识别）
question-answering （自动问答）
sentiment-analysis （情感分析）
summarization （自动摘要）
text-generation （文本生成）
translation （机器翻译）
zero-shot-classification （零训练样本分类）
下面我们以常见的几个 NLP 任务为例，展示如何调用这些 pipeline 模型。

In [2]:
# 情感分析
# 借助情感分析 pipeline，我们只需要输入文本，就可以得到其情感标签（积极/消极）以及对应的概率：

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("I've been waiting for a HuggingFace course my whole life.")
print(result)
results = classifier(
  ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)
print(results)

  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9598045349121094}]
[{'label': 'POSITIVE', 'score': 0.9598045349121094}, {'label': 'NEGATIVE', 'score': 0.9994558691978455}]


pipeline 模型会自动完成以下三个步骤：

将文本预处理为模型可以理解的格式；
将预处理好的文本送入模型；
对模型的预测值进行后处理，输出人类可以理解的格式。
pipeline 会自动选择合适的预训练模型来完成任务。例如对于情感分析，默认就会选择微调好的英文情感模型 distilbert-base-uncased-finetuned-sst-2-english。m

In [3]:
# 零训练样本分类
# 零训练样本分类 pipeline 允许我们在不提供任何标注数据的情况下自定义分类标签。
# 可以看到，pipeline 自动选择了预训练好的 facebook/bart-large-mnli 模型来完成任务。
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
result = classifier(
"This is a course about the Transformers library",
candidate_labels=["education", "politics", "business"],
)
print(result)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


{'sequence': 'This is a course about the Transformers library', 'labels': ['education', 'business', 'politics'], 'scores': [0.8445958495140076, 0.1119765117764473, 0.04342762380838394]}


In [4]:
from transformers import pipeline

generator = pipeline("text-generation")
results = generator("In this course, we will teach you how to")
print(results)
results = generator(
    "In this course, we will teach you how to",
    num_return_sequences=2,
    max_length=50
) 
print(results)

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/tex

[{'generated_text': "In this course, we will teach you how to build your own self-sustaining system, to build your own self-sustaining economy, and to learn how to make money from your own self-sustaining business.\n\nThis course will start with the basics of the self-sustaining system. By the time you finish, you will be able to start to think about how to make money from your own self-sustaining business.\n\nYou will also have a good understanding of the concept of self-sustaining (how to do it, how to make money and how to do it as a business) and how to make money from the self-sustaining economy.\n\nWe'll also teach you how to build your own self-sustaining economy in order to make money from your own self-sustaining business.\n\nWhile at this point, you should have a solid understanding of the concept of self-sustaining, you will also have the ability to start to think about how it can be done.\n\nThe next step is to build your own self-sustaining economy and make money from it.\

In [5]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
results = generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)
print(results)

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': "In this course, we will teach you how to use the word “o”, as we'll use the term to describe a specific object (a variable).\n\n\n\nLet's make a simple example on a small number of devices. It's simple, and a quick solution has an approach, but it will require a bunch of time to do something simple. First, let's implement it in a simple way. First, let's use a simple example.\npublic class DellsApp { public class DellsApp { public class DellsApp { public class DellsApp { public class CellsApp { public class DellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp { public class CellsApp {

In [6]:
# 还可以通过左边的语言 tag 选择其他语言的模型。例如加载专门用于生成中文古诗的 gpt2-chinese-poem 模型：
#pip install --upgrade torch torchvision torchaudio

from transformers import pipeline

generator = pipeline("text-generation", model="uer/gpt2-chinese-poem")
results = generator(
    "[CLS] 万 叠 春 山 积 雨 晴 ，",
    max_length=40,
    num_return_sequences=2,
)
print(results)

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': '[CLS] 万 叠 春 山 积 雨 晴 ， 堂 飒 自 然 。 然 北 户 风 卷 地 ， 恍 若 南 柯 梦 一 声 。 雷 破 柱 禹 无 穴 ， 排 云 排 空 汉 有 台 。 莫 言 万 里 封 侯 乐 ， 便 有 人 歌 定 赋 来 。 得 归 来 始 是 归 ， 故 园 花 柳 正 依 依 。 已 知 生 事 无 休 日 ， 未 抵 长 吟 惜 落 晖 。 万 里 沧 波 一 棹 归 ， 钓 丝 风 飏 荻 花 飞 。 江 湖 无 限 鱼 羹 饭 ， 日 日 高 堂 望 白 衣 。 城 女 儿 ， 水 神 女 江 波 臣 身 木 老 ， 女 花 貌 ， 风 花 貌 。 三 寸 寸 千 丈 一 寸 万 丈 。 不 盈 一 ， 三 重 一 寸 。 一 寸 管 ， 四 千 重 九 万 千 。 一 斤 两 两 斤 两 条 九 千 。 一 斤 两 两 头 ， 百 寻 二 条 九 寸 。 著 一 ， 一 把 三 枚 千 。 一 条 九 。 一 条 九 寸 。 一 条 九 织 一 条 九 孔 二 ， 一 条 九 头 二 斤 麻 三 机 断 ， 一 弦 一 把 两 条 九 孔 二 。 一 条 九 头 一 条 九 头 千 斤 两 弦 一 条'}, {'generated_text': '[CLS] 万 叠 春 山 积 雨 晴 ， 我 心 更 苦 。 予 望 青 山 几 重 ， 云 山 千 里 横 高 峰 。 看 云 飞 云 忽 止 ， 一 身 不 见 云 相 似 。 云 行 云 止 自 此 归 ， 云 去 云 来 何 所 止 。 日 日 寻 山 山 更 过 ， 山 中 人 不 知 山 多 。 云 自 高 飞 云 自 閒 ， 云 亦 不 飞 云 自 閒 。 云 自 悠 悠 山 自 閒 ， 我 心 何 与 白 云 閒 。 云 閒 云 出 无 拘 束 ， 山 在 云 兮 山 在 屋 。 山 不 移 ， 云 触 。 云 还 飞 ， 云 触 之 驰 ， 云 去 ， 云 还 飞 还 还 还 还 还 飞 。 云 去 。 不 归 之 飞 。 云 去 ， 云 去 ， 云 去 ， 云 去 住 ， 云 飞 去 还 还 还 归 还 归 兮 我 。 云 去 兮 何 来 ， 云 飞 兮 空 兮 空 。 云 归 兮 不 飞 兮 空 。 云 去 不 住 ， 云 

In [7]:
# ValueError: Due to a serious vulnerability issue in `torch.load`, even with `weights_only=True`, 
# we now require users to upgrade torch to at least v2.6 in order to use the function. 
# This version restriction does not apply when loading files with safetensors.
import torch
print(f"当前 PyTorch 版本: {torch.__version__}")

当前 PyTorch 版本: 2.8.0+cu126


In [None]:
# nvcc --version  #  in terminal to check the version of CUDA

In [8]:
# 给定一段部分词语被遮盖掉 (masked) 的文本，使用预训练模型来预测能够填充这些位置的词语。
# 与前面介绍的文本生成类似，这个任务其实也是先构建模板然后运用模型来完善模板，称为填充模板 (Cloze Prompt)。了解更多详细信息可以查看《Prompt 方法简介》。

from transformers import pipeline

unmasker = pipeline("fill-mask")
results = unmasker("This course will teach you all about <mask> models.", top_k=2)
print(results)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'score': 0.19619786739349365, 'token': 30412, 'token_str': ' mathematical', 'sequence': 'This course will teach you all about mathematical models.'}, {'score': 0.040527161210775375, 'token': 38163, 'token_str': ' computational', 'sequence': 'This course will teach you all about computational models.'}]


In [None]:
# 命名实体识别 (NER<=====) pipeline 负责从文本中抽取出指定类型的实体，例如人物、地点、组织等等。

from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
results = ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
print(results)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'entity_group': 'PER', 'score': np.float32(0.9981694), 'word': 'Sylvain', 'start': 11, 'end': 18}, {'entity_group': 'ORG', 'score': np.float32(0.9796019), 'word': 'Hugging Face', 'start': 33, 'end': 45}, {'entity_group': 'LOC', 'score': np.float32(0.9932105), 'word': 'Brooklyn', 'start': 49, 'end': 57}]




In [10]:
# 自动问答 pipeline 可以根据给定的上下文回答问题，例如：
# 这里的自动问答 pipeline 实际上是一个抽取式问答模型，即从给定的上下文中抽取答案，而不是生成答案。

from transformers import pipeline

question_answerer = pipeline("question-answering")
answer = question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)
print(answer)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


{'score': 0.6949766278266907, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}


根据形式的不同，自动问答 (QA) 系统可以分为三种：

抽取式 QA (extractive QA)：假设答案就包含在文档中，因此直接从文档中抽取答案；
多选 QA (multiple-choice QA)：从多个给定的选项中选择答案，相当于做阅读理解题；
无约束 QA (free-form QA)：直接生成答案文本，并且对答案文本格式没有任何限制。

In [11]:
#自动摘要 pipeline 旨在将长文本压缩成短文本，并且还要尽可能保留原文的主要信息，例如：

from transformers import pipeline

summarizer = pipeline("summarization")
results = summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
    """
)
print(results)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil, electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India, as well as other industrial countries in Europe and Asia, continue to encourage and advance engineering .'}]


In [None]:
#这些 pipeline 背后做了什么？
#这些简单易用的 pipeline 模型实际上封装了许多操作，下面我们就来了解一下它们背后究竟做了啥。
# 以第一个情感分析 pipeline 为例，我们运行下面的代码

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("I've been waiting for a HuggingFace course my whole life.")
print(result)


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9598045349121094}]


In [None]:

# 就会得到结果：

# [{'label': 'POSITIVE', 'score': 0.9598048329353333}]

# 实际上它的背后经过了三个步骤：

# 预处理 (preprocessing)，将原始文本转换为模型可以接受的输入格式；
# 将处理好的输入送入模型；
# 对模型的输出进行后处理 (postprocessing)，将其转换为人类方便阅读的格式。

因为神经网络模型无法直接处理文本，因此首先需要通过预处理环节将文本转换为模型可以理解的数字。具体地，我们会使用每个模型对应的分词器 (tokenizer) 来进行：

将输入切分为词语、子词或者符号（例如标点符号），统称为 tokens；
根据模型的词表将每个 token 映射到对应的 token 编号（就是一个数字）；
根据模型的需要，添加一些额外的输入。
我们对输入文本的预处理需要与模型自身预训练时的操作完全一致，只有这样模型才可以正常地工作。注意，每个模型都有特定的预处理操作，如果对要使用的模型不熟悉，可以通过 Model Hub 查询。这里我们使用 AutoTokenizer 类和它的 from_pretrained() 函数，它可以自动根据模型 checkpoint 名称来获取对应的分词器。

情感分析 pipeline 的默认 checkpoint 是 distilbert-base-uncased-finetuned-sst-2-english，下面我们手工下载并调用其分词器：



In [13]:

from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}


In [14]:
# 将预处理好的输入送入模型
# 预训练模型的下载方式和分词器 (tokenizer) 类似，Transformers 包提供了一个 AutoModel 类和对应的 from_pretrained() 函数。下面我们手工下载这个 distilbert-base 模型：

from transformers import AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

预训练模型的本体只包含基础的 Transformer 模块，对于给定的输入，它会输出一些神经元的值，称为 hidden states 或者特征 (features)。对于 NLP 模型来说，可以理解为是文本的高维语义表示。这些 hidden states 通常会被输入到其他的模型部分（称为 head），以完成特定的任务，例如送入到分类头中完成文本分类任务。

其实前面我们举例的所有 pipelines 都具有类似的模型结构，只是模型的最后一部分会使用不同的 head 以完成对应的任务。

transformer_and_head

Transformers 库封装了很多不同的结构，常见的有：

*Model （返回 hidden states）
*ForCausalLM （用于条件语言模型）
*ForMaskedLM （用于遮盖语言模型）
*ForMultipleChoice （用于多选任务）
*ForQuestionAnswering （用于自动问答任务）
*ForSequenceClassification （用于文本分类任务）
*ForTokenClassification （用于 token 分类任务，例如 NER）
Transformer 模块的输出是一个维度为 (Batch size, Sequence length, Hidden size) 的三维张量，其中 Batch size 表示每次输入的样本（文本序列）数量，即每次输入多少个句子，上例中为 2；Sequence length 表示文本序列的长度，即每个句子被分为多少个 token，上例中为 16；Hidden size 表示每一个 token 经过模型编码后的输出向量（语义表示）的维度。

In [15]:
# 我们可以打印出这里使用的 distilbert-base 模型的输出维度：

from transformers import AutoTokenizer, AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModel.from_pretrained(checkpoint)

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

torch.Size([2, 16, 768])


In [16]:
# Transformers 模型的输出格式类似 namedtuple 或字典，可以像上面那样通过属性访问，也可以通过键（outputs["last_hidden_state"]），甚至索引访问（outputs[0]）。

# 对于情感分析任务，很明显我们最后需要使用的是一个文本分类 head。因此，实际上我们不会使用 AutoModel 类，而是使用 AutoModelForSequenceClassification：

from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits.shape)

torch.Size([2, 2])


In [17]:
# 由于模型的输出只是一些数值，因此并不适合人类阅读。例如我们打印出上面例子的输出：

from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)

tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>)


模型对第一个句子输出 
，对第二个句子输出 
，它们并不是概率值，而是模型最后一层输出的 logits 值。要将他们转换为概率值，还需要让它们经过一个 SoftMax 层，例如：

In [18]:
import torch
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)

tensor([[4.0195e-02, 9.5980e-01],
        [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward0>)


In [19]:
print(model.config.id2label)

{0: 'NEGATIVE', 1: 'POSITIVE'}
