# Pipeline
Pipeline是Huggingface的一个基本工具，可以理解为一个端到端(end-to-end)的一键调用Transformer模型的工具。\
It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer.

给定一个任务之后，pipeline会自动调用一个预训练好的模型，然后根据你给的输入执行下面三个步骤：
1. 预处理输入文本，让它可被模型读取
2. 模型处理
3. 模型输出的后处理，让预测结果可读

一个例子如下：

In [1]:
from transformers import pipeline

clf = pipeline('sentiment-analysis')

In [4]:
clf('Haha, today is a nice day!')

[{'label': 'POSITIVE', 'score': 0.9998709559440613}]

还可以直接接受多个句子，一起预测：

In [13]:
clf(['good','nice','bad'])

[{'label': 'POSITIVE', 'score': 0.9998160600662231},
 {'label': 'POSITIVE', 'score': 0.9998552799224854},
 {'label': 'NEGATIVE', 'score': 0.999782383441925}]

pipeline支持的task包括：

- "feature-extraction": will return a FeatureExtractionPipeline.
- "text-classification": will return a TextClassificationPipeline.
- "sentiment-analysis": (alias of "text-classification") will return a TextClassificationPipeline.
- "token-classification": will return a TokenClassificationPipeline.
- "ner" (alias of "token-classification"): will return a TokenClassificationPipeline.
- "question-answering": will return a QuestionAnsweringPipeline.
- "fill-mask": will return a FillMaskPipeline.
- "summarization": will return a SummarizationPipeline.
- "translation_xx_to_yy": will return a TranslationPipeline.
- "text2text-generation": will return a Text2TextGenerationPipeline.
- "text-generation": will return a TextGenerationPipeline.
- "zero-shot-classification:: will return a ZeroShotClassificationPipeline.
- "conversational": will return a ConversationalPipeline.

## Have a try: Zero-shot-classification
零样本学习，就是训练一个可以预测任何标签的模型，这些标签可以不出现在训练集中。

一种零样本学习的方法，就是通过NLI（文本蕴含）任务，训练一个推理模型，比如这个例子：
```python
premise = 'Who are you voting for in 2020?'
hypothesis = 'This text is about politics.'
```
上面有一个前提(premise)和一个假设(hypothesis)，NLI任务就是去预测，在这个premise下，hypothesis是否成立。\
通过这样的训练，我们可以直接把hypothesis中的politics换成其他词儿，就可以实现zero-shot-learning了。

NLI任务的解释：it classifies if two sentences are logically linked across three labels (contradiction, neutral, entailment) — a task also called natural language inference.

参考阅读：
- 官方 Zero-shot-classification Pipeline文档：https://huggingface.co/transformers/main_classes/pipelines.html#transformers.ZeroShotClassificationPipeline
- 零样本学习简介：https://mp.weixin.qq.com/s/6aBzR0O3pwA8-btsuDX82g

In [None]:
clf = pipeline('zero-shot-classification')

In [21]:
clf(sequences=["A helicopter is flying in the sky",
               "A bird is flying in the sky"],
    candidate_labels=['animal','machine'])  # labels可以完全自定义

[{'sequence': 'A helicopter is flying in the sky',
  'labels': ['machine', 'animal'],
  'scores': [0.9938627481460571, 0.006137280724942684]},
 {'sequence': 'A bird is flying in the sky',
  'labels': ['animal', 'machine'],
  'scores': [0.9987970590591431, 0.0012029369827359915]}]

## Have a try: Text Generation

In [27]:
generator = pipeline('text-generation', model='liam168/chat-DialoGPT-small-zh')  # 默认使用gpt2，也可以指定模型

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=357.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=616.0, style=ProgressStyle(description_…




In [43]:
generator('上午')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': '上午上班吧'}]

## Have a try: Mask Filling

In [46]:
unmasker = pipeline('fill-mask')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355863.0, style=ProgressStyle(descript…




In [52]:
unmasker('What the <mask>?', top_k=3)  # 注意不同的模型，MASK token可能不一样，不一定都是 <mask>

[{'sequence': 'What the heck?',
  'score': 0.3783760964870453,
  'token': 17835,
  'token_str': ' heck'},
 {'sequence': 'What the hell?',
  'score': 0.32931089401245117,
  'token': 7105,
  'token_str': ' hell'},
 {'sequence': 'What the fuck?',
  'score': 0.14645449817180634,
  'token': 26536,
  'token_str': ' fuck'}]

## 更多的Task，见官方教程
https://huggingface.co/course/chapter1/3?fw=pt