### 使用HuggingFace transformers库进行文本操作

In [1]:
# 声明一个文本语料
text = """Dear Amazon,last week I ordered a Optimus Prime action figure from your online store in Germany.Unfortunately,when I
opened the package,I discovered to my horror that I had been sent an action figure of Megatron instead!As a lifelong enemy of the Decepticons,
I hope you can understand my dilemma.To resolve this issue,I demand a exchange of Megatron for the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase.I  expect to hear from you soon.Sincerely,Bumblebee.
"""

### 文本分类

In [2]:
from transformers import pipeline

classifier = pipeline("text-classification")

  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [3]:
# 加载分类结果
import pandas as pd

outputs = classifier(text)
df = pd.DataFrame(outputs)

df.head()

Unnamed: 0,label,score
0,NEGATIVE,0.926883


### 名称域识别

In [4]:
ner_tagger = pipeline("ner",aggregation_strategy="simple")

outputs = ner_tagger(text)

pd.DataFrame(outputs).head()

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.878396,Amazon,5,11
1,MISC,0.990415,Optimus Prime,34,47
2,LOC,0.999754,Germany,88,95
3,MISC,0.558927,Mega,203,207
4,PER,0.587491,##tron,207,211


### 问答模式

In [5]:
reader = pipeline("question-answering")
question = "What did the customer want?"
outputs = reader(question=question,context=text)
pd.DataFrame(outputs,index=[0])

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Unnamed: 0,score,start,end,answer
0,0.566709,328,350,a exchange of Megatron


### 文本总结

In [6]:
summerizer = pipeline("summarization")
outputs = summerizer(text,max_length=45,clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Your min_length=56 must be inferior than your max_length=45.


 Bumblebee ordered a Optimus Prime action figure from your online store in Germany. When he opened the package, he discovered to his horror that he had been sent an action figure of Megatron instead. Bumble


### 文本翻译

In [7]:
!pip install sentencepiece

Looking in indexes: https://mirrors.ustc.edu.cn/pypi/web/simple
[0m

In [9]:
translator = pipeline("translation_en_to_zh",model="Helsinki-NLP/opus-mt-en-zh")
outputs = translator(text,clean_up_tokenization_spaces=True)

print(outputs[0]['translation_text'])



亲爱的亚马逊,上星期我从你在德国的网上商店 订购了一台Optimus Prime Action 图形。 不幸的是,当我打开这个软件包时,我惊恐地发现,我被派去的是威震天的动作图!作为霸天虎的终身敌人,我希望你能够理解我的两难处境。为了解决这个问题,我要求用威震天王交换我订购的Optimus Prime 图形。我附上我购买该软件的记录的副本。我期待很快从你那里听到。


### 文本生成

In [11]:
generator = pipeline("text-generation")
response = "Dear Bumblebee,I am sorry to hear that your order was mixed up."
# 为模型准备好提示
prompt = text + "\n\nCustomer service response:\n" + response
# 生成文本
outputs = generator(prompt,max_length=200)
print(outputs[0]['generated_text'])

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dear Amazon,last week I ordered a Optimus Prime action figure from your online store in Germany.Unfortunately,when I
opened the package,I discovered to my horror that I had been sent an action figure of Megatron instead!As a lifelong enemy of the Decepticons,
I hope you can understand my dilemma.To resolve this issue,I demand a exchange of Megatron for the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase.I  expect to hear from you soon.Sincerely,Bumblebee.


Customer service response:
Dear Bumblebee,I am sorry to hear that your order was mixed up.My name is Bumblebee. I have seen many reviews on here. I have worked as an employee in the electronics industry for three or four years working as a technician. Over the years there was an increasing awareness that there were some people who were unhappy and frustrated with Optimus Prime's antics and he was often portrayed
