# Hugging Face Transformers

1. Text Classification
You can use the pipeline function for quick experimentation:

In [None]:
from transformers import pipeline

classifier = pipeline("text-classification")
result = classifier("I love using Hugging Face Transformers!")
print(result)


This will give you a quick result on the sentiment of the text.

To fine-tune a text classification model:


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset

# Load dataset and model
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Fine-tuning
training_args = TrainingArguments(output_dir="./results", evaluation_strategy="epoch")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"]
)

trainer.train()

2. Named Entity Recognition (NER)
For Named Entity Recognition, you can use:

In [None]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
result = ner("Hugging Face Inc. is a company based in New York City.")
print(result)

In [None]:
from transformers import AutoTokenizer, AutoModelForTokenClassification, Trainer, TrainingArguments

tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")


3. Text Generation
To generate text using GPT-2:



In [None]:
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")
result = generator("Once upon a time", max_length=50, num_return_sequences=1)
print(result)

# If you want to fine-tune GPT-2, you can use:

from transformers import GPT2LMHeadModel, GPT2Tokenizer, Trainer, TrainingArguments

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")


4. Text Summarization
For text summarization, use BART or T5:

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")
text = """
Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very close to the Manhattan Bridge.
Hugging Face is known for its open-source library Transformers, which provides pre-trained models for natural language processing.
"""
result = summarizer(text, max_length=50, min_length=25, do_sample=False)
print(result)

5. Question Answering
For question answering, you can do:

In [None]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
result = question_answerer(question="Where is Hugging Face based?", context="Hugging Face Inc. is based in New York City.")
print(result)

6. Translation Example
For translation, you can use the mBART model:

In [None]:
from transformers import pipeline

translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")
result = translator("Hello, how are you?")
print(result)

7. 迁移学习 (Transfer Learning)

迁移学习是利用在大型数据集上训练的预训练模型（如 BERT、GPT-2），然后对特定任务进行微调。示例代码如上所示，BERT 微调情感分析任务即为迁移学习的典型应用。


8. 零样本学习 (Zero-Shot Learning)

使用零样本分类模型（例如 BART 或 RoBERTa）进行分类：

In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
sequence_to_classify = "This is a great product, I love it!"
candidate_labels = ["positive", "negative", "neutral"]
result = classifier(sequence_to_classify, candidate_labels)
print(result)

9. 少量样本学习 (Few-Shot Learning)

利用 GPT-3 或类似的大型语言模型，通过提供几个示例进行少量样本学习：

In [None]:
import openai

openai.api_key = 'your-api-key'

prompt = """
The following are examples of classifying movie reviews as positive or negative:

Review: "This movie was amazing, the acting was great!"
Label: Positive

Review: "The film was dull and boring."
Label: Negative

Review: "I had a great time watching it."
Label:
"""
response = openai.Completion.create(
    engine="text-davinci-003",
    prompt=prompt,
    max_tokens=5
)

print(response.choices[0].text.strip())

10. 从0训练模型 (Training from Scratch)

从头开始训练一个 Transformer 模型：

In [None]:
from transformers import AutoTokenizer, AutoConfig, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset

# 定义模型配置
config = AutoConfig.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification(config)

# 加载数据集
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# 对数据进行tokenize
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# 定义训练参数
training_args = TrainingArguments(output_dir="./results", evaluation_strategy="epoch")

# 使用 Trainer 进行训练
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"]
)

trainer.train()

有难度的类型

除了上面提到的常见任务，NLP 中还有一些更具挑战性的任务：

多任务学习 (Multi-Task Learning)：训练一个模型同时完成多个不同类型的任务，例如同时进行文本分类和命名实体识别。

领域自适应 (Domain Adaptation)：将模型从一个领域迁移到另一个领域，通常需要应对不同领域之间的差异，例如从新闻数据迁移到医学数据。

强化学习 (Reinforcement Learning for NLP)：在对话生成或摘要中使用强化学习来优化生成质量。

元学习 (Meta Learning)：通过学习如何学习，提高模型在少样本场景中的表现。

对抗性训练 (Adversarial Training)：训练模型在面对对抗性输入时具有更好的鲁棒性，防止模型受到对抗性攻击的影响。