In [1]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure
    from your online store in Germany. Unfortunately, when I opened the package,
    I discovered to my horror that I had been sent an action figure of Megatron
    instead! As a lifelong enemy of the Decepticons, I hope you can understand my
    dilemma. To resolve the issue, I demand an exchange of Megatron for the
    Optimus Prime figure I ordered. Enclosed are copies of my records concerning
    this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

下面使用 Hugging Face 的 `transformer` API 来对上面的文本进行情感分类。第一次执行的时候，会下载模型参数，所以需要等待一下。

In [2]:
from transformers import pipeline

classifier = pipeline('text-classification')

  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [6]:
outputs = classifier(text)
print(outputs)

[{'label': 'NEGATIVE', 'score': 0.9015464782714844}]


In [7]:
import pandas as pd

pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.901546


一般我们也希望从文本中获取实体的名字，比如一个服务，一个物件，一个地方等。在自然语言处理里，从文本中提取这些实体名字称为 named entity recognition (NER)。我们可以使用对应的 `pipeline` 来进行这个任务。

In [8]:
ner_tagger = pipeline('ner', aggregation_strategy='simple')
outputs = ner_tagger(text)
pd.DataFrame(outputs)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 998/998 [00:00<00:00, 432kB/s]
model.safetensors: 100%|██████████| 1.33G/1.33G [01:52<00:00, 11.9MB/s]
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identi

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.879011,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,94,101
3,MISC,0.556571,Mega,216,220
4,PER,0.590255,##tron,220,224
5,ORG,0.669693,Decept,265,271
6,MISC,0.498348,##icons,271,276
7,MISC,0.775362,Megatron,366,374
8,MISC,0.987854,Optimus Prime,387,400
9,PER,0.812096,Bumblebee,526,535


下面使用 `question-answering` 这个 pipeline 来做问答。

In [9]:
reader = pipeline('question-answering')
question = 'what does the customer want?'
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 473/473 [00:00<00:00, 226kB/s]
model.safetensors: 100%|██████████| 261M/261M [00:20<00:00, 12.5MB/s] 
tokenizer_config.json: 100%|██████████| 29.0/29.0 [00:00<00:00, 9.27kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 7.62MB/s]
tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 7.55MB/s]


Unnamed: 0,score,start,end,answer
0,0.642405,351,374,an exchange of Megatron


下面使用 `summarization` 这个 pipeline 来做文本总结。

In [10]:
summarizer = pipeline('summarization')
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 1.80k/1.80k [00:00<00:00, 6.51MB/s]
pytorch_model.bin: 100%|██████████| 1.22G/1.22G [01:42<00:00, 11.9MB/s]
tokenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 8.41kB/s]
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 1.96MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 2.08MB/s]
Your min_length=56 must be inferior than your max_length=45.


 Bumblebee ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that he had been sent an action figure of Megatron instead.


In [3]:
translator = pipeline("translation_en_to_de", model="Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

source.spm: 100%|██████████| 768k/768k [00:00<00:00, 7.35MB/s]
target.spm: 100%|██████████| 797k/797k [00:00<00:00, 8.86MB/s]
vocab.json: 100%|██████████| 1.27M/1.27M [00:00<00:00, 4.29MB/s]


Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt. Eingeschlossen sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, von Ihnen bald zu hören. Aufrichtig, Bumblebee.


In [5]:
generator = pipeline('text-generation', model='distilgpt2')
response = 'Dear Bumblebee, I am sorry to hear that your order was mixed up.'
prompt = text + '\n\nCustomer service response:\n' + response
outputs = generator(prompt, max_length=200)
print(outputs[0]['generated_text'])

config.json: 100%|██████████| 762/762 [00:00<00:00, 356kB/s]
model.safetensors: 100%|██████████| 353M/353M [00:36<00:00, 9.62MB/s] 
generation_config.json: 100%|██████████| 124/124 [00:00<00:00, 41.2kB/s]
vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 1.17MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 755kB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 2.35MB/s]
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dear Amazon, last week I ordered an Optimus Prime action figure
    from your online store in Germany. Unfortunately, when I opened the package,
    I discovered to my horror that I had been sent an action figure of Megatron
    instead! As a lifelong enemy of the Decepticons, I hope you can understand my
    dilemma. To resolve the issue, I demand an exchange of Megatron for the
    Optimus Prime figure I ordered. Enclosed are copies of my records concerning
    this purchase. I expect to hear from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. Your package shipped to your home the following morning. Please don't let anything interfere with your normal business. I sincerely apologize to you. You must not let any inconvenience be overcome.
The Autobot's Prime Team
