# Pipline 实现情感分析

In [1]:
from transformers import pipeline
# 仅指定任务时，使用默认模型(不推荐)
pipe = pipeline("sentiment-analysis") #参数 "sentiment-analysis" 表示任务类型为情感分析
pipe ("今儿上海可真冷啊 " ) #输出：情感类型，置信度

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


[{'label': 'NEGATIVE', 'score': 0.8957216739654541}]

# pipeline实现智能问答

In [3]:
from transformers import pipeline
question_answerer = pipeline(task="question-answering")  #创建一个问答任务的 pipeline，默认使用 SQuAD 数据集上微调的 DistilBERT 模型。

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [4]:
preds = question_answerer(
    question="What is the name of the repository?",
    context="The name of the repository is huggingface/transformers"
)

print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, "
    f"end: {preds['end']}, answer: {preds['answer']}"
)

# 返回：模型置信度、答案在原文起始字符的索引位置、答案在原文结束字符的索引位置、答案

score: 0.9333, start: 30, end: 54, answer: huggingface/transformers


In [5]:
preds = question_answerer(
    question="What is the capital of China?",
    context="On 1 October 1949, Chairman Mao Zedong formally proclaimed the People's Republic of China in Tiananmen Square, Beijing."
)

In [6]:
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, "
    f"end: {preds['end']}, answer: {preds['answer']}"
)

score: 0.9683, start: 111, end: 118, answer: Beijing


# Pipelines 实现语音识别

In [None]:
from transformers import pipeline
transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-small")


In [None]:
text = transcriber("data/audio/mlk.flac")
text

# Pipeline 实现图像识别

In [7]:
# 创建图像分类任务的 pipeline
classifier = pipeline(task="image-classification")

# 使用默认模型对图片进行分类
preds = classifier(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)

No model was supplied, defaulted to google/vit-base-patch16-224 and revision 3f49326 (https://huggingface.co/google/vit-base-patch16-224).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/346M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
Device set to use cpu


In [8]:
# 格式化输出结果（保留 4 位小数）
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(*preds, sep="\n")

{'score': 0.4335, 'label': 'lynx, catamount'}
{'score': 0.0348, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'}
{'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'}
{'score': 0.0239, 'label': 'Egyptian cat'}
{'score': 0.0229, 'label': 'tiger cat'}


# 使用utoClass管理Tokenizer和model

通常，想要使用的模型（网络架构）可以从您提供给from_pretrained()方法的预训练模型的名称或路径中推测出

In [9]:
from transformers import AutoTokenizer, AutoModel

# 加载中文 BERT 模型及其分词器
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
model = AutoModel.from_pretrained("bert-base-chinese")


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/624 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/110k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/269k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/412M [00:00<?, ?B/s]