# Hugging Face

## 0. Environment Settings 环境配置

- 在conda环境中安装基础python依赖：
    ```bash
    conda create -n transformer
    conda activate transformer
    pip install transformer datasets evaluate peft accelerate gradio optimum sentencepiece jupyterlab scikit-learn pandas matplotlib tensorboard nltk rouge
    ```

- 网络问题更改本地电脑hosts：*本条操作待验证*
  - hosts储存路径为: *C:\Windows\System32\drivers\etc\hosts*
  - 添加如下内容：

## 1. Pipeline 工具

### Pipeline 概念


[Pipeline Official Doc](https://huggingface.co/transformers/main_classes/pipelines.html#transformers.pipeline)

- pipeline是一个封装好的工具，可以通过直接的声明进行调用，完成一些预先设定好的NLP任务

- pipeline的大致syntax为（较常用的一些opt参数）：`pipeline(task="", model="", tokenizer="", device=)`

![20230716105734](https://michael-1313341240.cos.ap-shanghai.myqcloud.com/20230716105734.png)


#### Pipeline支持的Task


- task主要可以包括的任务有：
  - 文本分类（包含情感分析）: `task = "text-classification"` `task = "zero-shot-classification"`
  - Token 分类（包含命名实体识别 NER）: `task = "token-classification"`
  - 人机对话: `task = "conversational"`
  - 文档问答: `task = "document-question-answering"`
  - 问题回答: `task = "question-answering"`
  - 表格回答: `task = "table-question-answering"`
  - 填空: `task = "fill-mask"`
  - 生成摘要: `task = "summarization"`
  - Text-to-text 文本生成: `task = "text2text-generation"`
  - 机器翻译: `task = "translation"`
  - 跨模态任务: `task = "image-to-text"` `task = "visual-question-answering"`


*可以通过下列命令查看 **pipeline 支持的所有任务***

In [1]:
from transformers.pipelines import SUPPORTED_TASKS
print(*SUPPORTED_TASKS.items(),sep="\n")

('audio-classification', {'impl': <class 'transformers.pipelines.audio_classification.AudioClassificationPipeline'>, 'tf': (), 'pt': (<class 'transformers.models.auto.modeling_auto.AutoModelForAudioClassification'>,), 'default': {'model': {'pt': ('superb/wav2vec2-base-superb-ks', '372e048')}}, 'type': 'audio'})
('automatic-speech-recognition', {'impl': <class 'transformers.pipelines.automatic_speech_recognition.AutomaticSpeechRecognitionPipeline'>, 'tf': (), 'pt': (<class 'transformers.models.auto.modeling_auto.AutoModelForCTC'>, <class 'transformers.models.auto.modeling_auto.AutoModelForSpeechSeq2Seq'>), 'default': {'model': {'pt': ('facebook/wav2vec2-base-960h', '55bb623')}}, 'type': 'multimodal'})
('feature-extraction', {'impl': <class 'transformers.pipelines.feature_extraction.FeatureExtractionPipeline'>, 'tf': (), 'pt': (<class 'transformers.models.auto.modeling_auto.AutoModel'>,), 'default': {'model': {'pt': ('distilbert-base-cased', '935ac13'), 'tf': ('distilbert-base-cased', '9

#### Pipeline的创建与使用（例）

- 在HuggingFace的[Model](https://huggingface.co/models)分区可以寻找想要用的模型
- 点击进入模型详细页面后，可以看到具体描述和demo交互

![20230716120431](https://michael-1313341240.cos.ap-shanghai.myqcloud.com/20230716120431.png)

In [14]:
from transformers import AutoModelForSequenceClassification,AutoTokenizer,pipeline
# model是指定这个task要使用的nlp模型
model = AutoModelForSequenceClassification.from_pretrained('uer/roberta-base-finetuned-chinanews-chinese')
# tokenizer是指定task的分词器，其具体含义会在后面进行学习
tokenizer = AutoTokenizer.from_pretrained('uer/roberta-base-finetuned-chinanews-chinese')
# 通过pipeline(...)就会生成这样一个封装好的工具，这里命名为pipe
## 此处的task为sentiment analysis，即情感分析
pipe = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
# 调用pipe，配合相应的输入，就可以得到nlp任务的相应输出
pipe("暴雪被微软收购了！")

[{'label': 'financial news', 'score': 0.5913024544715881}]

**使用GPU加速计算**

通过`pipeline(, ... , device = 0)`显式加载第0张GPU进行加速计算

In [29]:
import torch
import time
times = []

# GPU inference
pipe_gpu = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer,device=0)
for i in range(1000):
    torch.cuda.synchronize()
    start = time.time()
    pipe_gpu("暴雪被微软收购了！")
    torch.cuda.synchronize()
    end = time.time()
    times.append(end - start)
print(f"GPU:{sum(times) / 1000}")

# CPU inference
for i in range(1000):
    torch.cuda.synchronize()
    start = time.time()
    pipe("暴雪被微软收购了！")
    torch.cuda.synchronize()
    end = time.time()
    times.append(end - start)
print(f"CPU:{sum(times) / 1000}")

GPU:0.005329703807830811
CPU:0.01029831838607788
