## transformers 自定义模型下载的路径

In [13]:
# 测试代理
import os
import requests

# 设置代理环境变量
os.environ['HTTP_PROXY'] = 'http://127.0.0.1:7890'
os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:7890'
os.environ['ALL_PROXY'] = 'socks5://127.0.0.1:7891'

# 测试代理连接
try:
    response = requests.get('https://huggingface.co', timeout=10)
    print("✅ HuggingFace 连接成功，状态码:", response.status_code)
except Exception as e:
    print("❌ 连接失败:", e)

# 设置 HuggingFace 缓存路径
os.environ['HF_HOME'] = '/home/KevinLiangX/Codes/LLM-quickstart-main/hf'
os.environ['HF_HUB_CACHE'] = '/home/KevinLiangX/Codes/LLM-quickstart-main/hf_hu'

✅ HuggingFace 连接成功，状态码: 200


# 使用默认的模型(distilbert-base-uncased-finetuned-sst-2-english)

In [14]:
from transformers import pipeline

# 仅指定任务时，使用默认模型（不推荐）
pipe = pipeline("sentiment-analysis")
pipe("今儿上海可真冷啊")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.8957216739654541}]

In [15]:
from transformers import pipeline

# 仅指定任务时，使用默认模型（不推荐）
pipe = pipeline("sentiment-analysis")
pipe("今儿上海可真美啊")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.8523963689804077}]

In [16]:
pipe("我觉得这家店蒜泥白肉的味道真好")

[{'label': 'NEGATIVE', 'score': 0.9360455274581909}]

In [17]:
pipe("shanghai people is very nice")

[{'label': 'POSITIVE', 'score': 0.9998599290847778}]

In [18]:
# 使用 hfl/chinese-electra-180g-small-discriminator 看更小的模型识别效果如何

In [19]:
from transformers import pipeline

# 仅指定任务时，使用默认模型（不推荐）
pipe = pipeline("sentiment-analysis",
                model="hfl/chinese-electra-180g-small-discriminator")
pipe("今儿上海可真冷啊")

Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at hfl/chinese-electra-180g-small-discriminator and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[{'label': 'LABEL_0', 'score': 0.5044581294059753}]

In [20]:
pipe("今天天气真好")

[{'label': 'LABEL_0', 'score': 0.503697395324707}]

In [21]:
pipe("去看唱会了")

[{'label': 'LABEL_0', 'score': 0.5090354084968567}]

## Token Classification

In [22]:
from transformers import pipeline

classifier = pipeline(task="ner",model="google/mobilebert-uncased")

Some weights of MobileBertForTokenClassification were not initialized from the model checkpoint at google/mobilebert-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [23]:
preds = classifier("Hugging Face is a French company based in New York City.")
preds = [
    {
        "entity": pred["entity"],
        "score": round(pred["score"], 4),
        "index": pred["index"],
        "word": pred["word"],
        "start": pred["start"],
        "end": pred["end"],
    }
    for pred in preds
]
print(*preds, sep="\n")

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


{'entity': 'LABEL_0', 'score': 0.6631, 'index': 1, 'word': 'hugging', 'start': 0, 'end': 7}
{'entity': 'LABEL_0', 'score': 0.7542, 'index': 2, 'word': 'face', 'start': 8, 'end': 12}
{'entity': 'LABEL_1', 'score': 0.5751, 'index': 3, 'word': 'is', 'start': 13, 'end': 15}
{'entity': 'LABEL_0', 'score': 0.6119, 'index': 4, 'word': 'a', 'start': 16, 'end': 17}
{'entity': 'LABEL_1', 'score': 0.7222, 'index': 5, 'word': 'french', 'start': 18, 'end': 24}
{'entity': 'LABEL_1', 'score': 0.8517, 'index': 6, 'word': 'company', 'start': 25, 'end': 32}
{'entity': 'LABEL_1', 'score': 0.6151, 'index': 7, 'word': 'based', 'start': 33, 'end': 38}
{'entity': 'LABEL_0', 'score': 0.9183, 'index': 8, 'word': 'in', 'start': 39, 'end': 41}
{'entity': 'LABEL_0', 'score': 0.9915, 'index': 9, 'word': 'new', 'start': 42, 'end': 45}
{'entity': 'LABEL_0', 'score': 0.9124, 'index': 10, 'word': 'york', 'start': 46, 'end': 50}
{'entity': 'LABEL_0', 'score': 0.9308, 'index': 11, 'word': 'city', 'start': 51, 'end': 55}

In [24]:
classifier = pipeline(task="ner",model="google/mobilebert-uncased", grouped_entities=True)
classifier("Hugging Face is a French company based in New York City.")

Some weights of MobileBertForTokenClassification were not initialized from the model checkpoint at google/mobilebert-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


[{'entity_group': 'LABEL_1',
  'score': 0.74353963,
  'word': 'hugging face is a',
  'start': 0,
  'end': 17},
 {'entity_group': 'LABEL_0',
  'score': 0.5236963,
  'word': 'french',
  'start': 18,
  'end': 24},
 {'entity_group': 'LABEL_1',
  'score': 0.90481365,
  'word': 'company',
  'start': 25,
  'end': 32},
 {'entity_group': 'LABEL_0',
  'score': 0.5401115,
  'word': 'based',
  'start': 33,
  'end': 38},
 {'entity_group': 'LABEL_1',
  'score': 0.5478684,
  'word': 'in',
  'start': 39,
  'end': 41},
 {'entity_group': 'LABEL_0',
  'score': 0.74460316,
  'word': 'new',
  'start': 42,
  'end': 45},
 {'entity_group': 'LABEL_1',
  'score': 0.8416981,
  'word': 'york city.',
  'start': 46,
  'end': 56}]

## Question Answering

In [25]:
from transformers import pipeline

question_answerer = pipeline(task="question-answering",model="distilbert-base-cased-distilled-squad")

In [26]:
preds = question_answerer(
    question="What is the name of the repository?",
    context="The name of the repository is huggingface/transformers",
)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)

score: 0.9327, start: 30, end: 54, answer: huggingface/transformers


In [27]:
preds = question_answerer(
    question="What is the capital of China?",
    context="On 1 October 1949, CCP Chairman Mao Zedong formally proclaimed the People's Republic of China in Tiananmen Square, Beijing.",
)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)

score: 0.9458, start: 115, end: 122, answer: Beijing


In [28]:
preds = question_answerer(
    question="What is your favor?",
    context="football,basketball",
)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)

score: 0.4857, start: 0, end: 8, answer: football


In [29]:
preds = question_answerer(
    question="你在哪",
    context="西安",
)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)

score: 0.3516, start: 0, end: 2, answer: 西安


In [30]:
preds = question_answerer(
    question="讲一个鬼故事",
    context="oh  oh  oh ,no no no  ",
)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)

score: 0.2166, start: 12, end: 20, answer: no no no


## 文本摘要

In [31]:
from transformers import pipeline

summarizer = pipeline(task="summarization",
                      model="t5-small",
                      min_length=8,
                      max_length=15,
)

In [32]:
summarizer(
    """
    In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, 
    replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. 
    For translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers. 
    On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, we achieve a new state of the art. 
    In the former task our best model outperforms even all previously reported ensembles.
    """
)

[{'summary_text': 'the Transformer is the first sequence transduction model based entirely on attention'}]

In [33]:
summarizer(
    """
   音频和语音处理任务与其他模态略有不同，主要是因为音频作为输入是一个连续的信号。与文本不同，
   原始音频波形不能像句子可以被划分为单词那样被整齐地分割成离散的块。为了解决这个问题，通常在固定
   的时间间隔内对原始音频信号进行采样。如果在每个时间间隔内采样更多样本，采样率就会更高，音频更接
   近原始音频源。
    """
)

[{'summary_text': ',, .  .'}]

In [34]:
## 识别中文不行么

## Audio 音频处理任务

In [35]:
from transformers import pipeline

classifier = pipeline(task="audio-classification", model="superb/hubert-base-superb-er")

config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/378M [00:00<?, ?B/s]

Some weights of the model checkpoint at superb/hubert-base-superb-er were not used when initializing HubertForSequenceClassification: ['hubert.encoder.pos_conv_embed.conv.weight_g', 'hubert.encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertForSequenceClassification were not initialized from the model checkpoint at superb/hubert-base-superb-er and are newly initialized: ['hubert.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'hubert.encoder.pos_conv_embed.conv.parametriza

preprocessor_config.json:   0%|          | 0.00/213 [00:00<?, ?B/s]

In [36]:
# 使用 Hugging Face Datasets 上的测试文件
preds = classifier("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds

[{'score': 0.4532, 'label': 'hap'},
 {'score': 0.3622, 'label': 'sad'},
 {'score': 0.0943, 'label': 'neu'},
 {'score': 0.0903, 'label': 'ang'}]

In [37]:
# 使用本地的音频文件做测试
preds = classifier("data/audio/mlk.flac")
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds


[{'score': 0.4532, 'label': 'hap'},
 {'score': 0.3622, 'label': 'sad'},
 {'score': 0.0943, 'label': 'neu'},
 {'score': 0.0903, 'label': 'ang'}]

# 使用其他类似模型试试 ：openai/whisper-tiny

In [38]:
from transformers import pipeline

classifier = pipeline(task="audio-classification", model="openai/whisper-tiny")

Some weights of WhisperForAudioClassification were not initialized from the model checkpoint at openai/whisper-tiny and are newly initialized: ['model.classifier.bias', 'model.classifier.weight', 'model.projector.bias', 'model.projector.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [39]:
# 使用 Hugging Face Datasets 上的测试文件
preds = classifier("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds

[{'score': 0.502, 'label': 'LABEL_1'}, {'score': 0.498, 'label': 'LABEL_0'}]

In [40]:
# 使用本地的音频文件做测试
preds = classifier("data/audio/mlk.flac")
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds


[{'score': 0.502, 'label': 'LABEL_1'}, {'score': 0.498, 'label': 'LABEL_0'}]

In [41]:
from transformers import pipeline

classifier = pipeline(task="audio-classification", model="openai/whisper-base")

Some weights of WhisperForAudioClassification were not initialized from the model checkpoint at openai/whisper-base and are newly initialized: ['model.classifier.bias', 'model.classifier.weight', 'model.projector.bias', 'model.projector.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [42]:
# 使用 Hugging Face Datasets 上的测试文件
preds = classifier("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds

[{'score': 0.5054, 'label': 'LABEL_0'}, {'score': 0.4946, 'label': 'LABEL_1'}]

In [43]:
# 使用本地的音频文件做测试
preds = classifier("data/audio/mlk.flac")
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds

[{'score': 0.5054, 'label': 'LABEL_0'}, {'score': 0.4946, 'label': 'LABEL_1'}]

In [44]:
## 这效果明显不行，感觉跟参数和训练的数据集有关系

## 自动语音识别

In [45]:
from transformers import pipeline

# 使用 `model` 参数指定模型
transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-small")

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [46]:
text = transcriber("data/audio/mlk.flac")
text

{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

In [47]:
from transformers import pipeline

# 使用 `model` 参数指定模型
transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-tiny")

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [48]:
text = transcriber("data/audio/mlk.flac")
text

{'text': ' I have a dream. Good one day. This nation will rise up. Live out the true meaning of its dream.'}

In [49]:
from transformers import pipeline

# 使用 `model` 参数指定模型
transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-base")

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [50]:
text = transcriber("data/audio/mlk.flac")
text

{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

## Computer Vision 计算机视觉


In [51]:
from transformers import pipeline

classifier = pipeline(task="image-classification")

No model was supplied, defaulted to google/vit-base-patch16-224 and revision 5dca96d (https://huggingface.co/google/vit-base-patch16-224).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [52]:
preds = classifier(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(*preds, sep="\n")

{'score': 0.4335, 'label': 'lynx, catamount'}
{'score': 0.0348, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'}
{'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'}
{'score': 0.0239, 'label': 'Egyptian cat'}
{'score': 0.0229, 'label': 'tiger cat'}


In [53]:
# 使用本地图片（狼猫）
preds = classifier(
    "data/image/cat-chonk.jpeg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(*preds, sep="\n")

{'score': 0.4335, 'label': 'lynx, catamount'}
{'score': 0.0348, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'}
{'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'}
{'score': 0.0239, 'label': 'Egyptian cat'}
{'score': 0.0229, 'label': 'tiger cat'}


In [54]:
# 使用本地图片（熊猫）
preds = classifier(
    "data/image/panda.jpg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(*preds, sep="\n")

{'score': 0.9962, 'label': 'giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca'}
{'score': 0.0018, 'label': 'lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens'}
{'score': 0.0002, 'label': 'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus'}
{'score': 0.0001, 'label': 'sloth bear, Melursus ursinus, Ursus ursinus'}
{'score': 0.0001, 'label': 'brown bear, bruin, Ursus arctos'}


## 使用其他小模型，试试效果

In [55]:
from transformers import pipeline

classifier = pipeline(task="image-classification",model="facebook/deit-small-patch16-224")

In [56]:
preds = classifier(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(*preds, sep="\n")

{'score': 0.4373, 'label': 'lynx, catamount'}
{'score': 0.0268, 'label': 'spaghetti squash'}
{'score': 0.0199, 'label': 'banana'}
{'score': 0.0182, 'label': 'lemon'}
{'score': 0.0174, 'label': 'sulphur butterfly, sulfur butterfly'}


In [57]:
# 使用本地图片（狼猫）
preds = classifier(
    "data/image/cat-chonk.jpeg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(*preds, sep="\n")

{'score': 0.4373, 'label': 'lynx, catamount'}
{'score': 0.0268, 'label': 'spaghetti squash'}
{'score': 0.0199, 'label': 'banana'}
{'score': 0.0182, 'label': 'lemon'}
{'score': 0.0174, 'label': 'sulphur butterfly, sulfur butterfly'}


In [58]:
# 使用本地图片（熊猫）
preds = classifier(
    "data/image/panda.jpg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
print(*preds, sep="\n")

{'score': 0.2693, 'label': 'giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca'}
{'score': 0.0313, 'label': 'spaghetti squash'}
{'score': 0.0206, 'label': 'dough'}
{'score': 0.0157, 'label': 'mashed potato'}
{'score': 0.0138, 'label': 'eggnog'}


In [59]:
## 还行

## 目标检测

In [60]:
from transformers import pipeline

detector = pipeline(task="object-detection")

No model was supplied, defaulted to facebook/detr-resnet-50 and revision 2729413 (https://huggingface.co/facebook/detr-resnet-50).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at facebook/detr-resnet-50 were not used when initializing DetrForObjectDetection: ['model.backbone.conv_encoder.model.layer1.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing DetrForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DetrForObjectDetection from the checkpoint of a 

In [61]:
preds = detector(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"], "box": pred["box"]} for pred in preds]
preds

[{'score': 0.9864,
  'label': 'cat',
  'box': {'xmin': 178, 'ymin': 154, 'xmax': 882, 'ymax': 598}}]

In [62]:
preds = detector(
    "data/image/cat_dog.jpg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"], "box": pred["box"]} for pred in preds]
preds

[{'score': 0.9985,
  'label': 'cat',
  'box': {'xmin': 78, 'ymin': 57, 'xmax': 309, 'ymax': 371}},
 {'score': 0.989,
  'label': 'dog',
  'box': {'xmin': 279, 'ymin': 20, 'xmax': 482, 'ymax': 416}}]

##  使用YOLO试试

In [63]:
from transformers import pipeline

detector = pipeline(task="object-detection",
                    model="hustvl/yolos-small")

In [64]:
preds = detector(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"], "box": pred["box"]} for pred in preds]
preds

[{'score': 0.9986,
  'label': 'cat',
  'box': {'xmin': 184, 'ymin': 161, 'xmax': 871, 'ymax': 599}}]

In [65]:
preds = detector(
    "data/image/cat_dog.jpg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"], "box": pred["box"]} for pred in preds]
preds

[{'score': 0.996,
  'label': 'dog',
  'box': {'xmin': 284, 'ymin': 22, 'xmax': 486, 'ymax': 415}},
 {'score': 0.9896,
  'label': 'cat',
  'box': {'xmin': 73, 'ymin': 66, 'xmax': 295, 'ymax': 372}}]

## 整体感觉同一系列的 参数量越大，识别出来的越好，不同系列的，跟训练的数据集有关，效果千奇百怪