### <b style="color: #5b7daf">01.Transformer</b>

> #### <b style="color: #f86461">RNN</b>
> ![RNN](./resources/RNN.png)
> - 언어 모델에 유용함  
> - 각 RNN 셀이 상태를 가짐  
> - 상태와 데이터 2개의 입력  
> - 하나의 상태로 데이터가 압축되어 손실발생  
> - 어텐션을 사용해서 극복  
> - 순차적 입력이라는 한계..

> #### <b style="color: #f86461">일반적 모델학습</b>
> - 사전학습 -> 전이학습/파인튜닝 -> 도메인 적용  
> - 아키텍쳐 구현 -> 사전학습 로드 -> 데이터 전처리 -> 데이터로더 구현 -> loss, optimizer 정의
> - 표준화 된 소스가 아니어서 새로운 도메인에 적용 어려움  
> - 그래서 **허깅페이스**

In [1]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure from your
online store in Germany. Unfortunately, when I opened the package, I discovered to
my horror that I had been sent an action figure of Megatron instead! As  a lifelong
enemy of the Decepticons, I hope you can understand my dilemma. To resolve the
issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase. I expect to hear from
you soon. Sincerely, Bumblebee."""

In [2]:
# 텍스트 분류
from transformers import pipeline
classifier = pipeline('text-classification')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [3]:
import pandas as pd
outputs = classifier(text)
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.901546


In [4]:
half_size = len(text) // 2
head, tail = text[:half_size], text[half_size:]
outputs = classifier([head, tail])
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.999312
1,POSITIVE,0.990263


In [5]:
# 개체명 인식(NER: Named Entity Recognition)
ner_tagger = pipeline('ner', aggregation_strategy='simple')
outputs = ner_tagger(text)
pd.DataFrame(outputs)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.87901,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.556567,Mega,208,212
4,PER,0.590256,##tron,212,216
5,ORG,0.669691,Decept,254,260
6,MISC,0.498349,##icons,260,265
7,MISC,0.775361,Megatron,351,359
8,MISC,0.987854,Optimus Prime,368,381
9,PER,0.812097,Bumblebee,503,512


In [6]:
multi_outputs = ner_tagger([head, tail])
for output in multi_outputs:
    print(pd.DataFrame(output))

  entity_group     score           word  start  end
0          ORG  0.927624         Amazon      5   11
1         MISC  0.983534  Optimus Prime     36   49
2          LOC  0.999749        Germany     90   97
3          PER  0.813427       Megatron    208  216
4          ORG  0.569146             De    254  256
  entity_group     score           word  start  end
0         MISC  0.508540          ##ico      4    7
1         MISC  0.511747           Mega     95   99
2          PER  0.590190         ##tron     99  103
3         MISC  0.961195  Optimus Prime    112  125
4          PER  0.780502      Bumblebee    247  256


In [7]:
# 질문-답변(Question-Answering)
reader = pipeline('question-answering')
question = 'What does the customer want?'
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Unnamed: 0,score,start,end,answer
0,0.631292,336,359,an exchange of Megatron


In [8]:
# 요약 (Summarization)
summarizer = pipeline('summarization')
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)
print(outputs)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Your min_length=56 must be inferior than your max_length=45.


[{'summary_text': ' Bumblebee ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when he opened the package, he discovered to his horror that he had been sent an action figure of Megatron instead.'}]


In [9]:
print(outputs[0]['summary_text'])

 Bumblebee ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when he opened the package, he discovered to his horror that he had been sent an action figure of Megatron instead.


In [10]:
print(summarizer(text, max_length=20, clean_up_tokenization_spaces=True)[0]['summary_text'])

Your min_length=56 must be inferior than your max_length=20.


 Bumblebee ordered an Optimus Prime action figure from an online store in Germany. When


In [12]:
!pip install sacremoses

Collecting sacremoses
  Obtaining dependency information for sacremoses from https://files.pythonhosted.org/packages/0b/f0/89ee2bc9da434bd78464f288fdb346bc2932f2ee80a90b2a4bbbac262c74/sacremoses-0.1.1-py3-none-any.whl.metadata
  Downloading sacremoses-0.1.1-py3-none-any.whl.metadata (8.3 kB)
Downloading sacremoses-0.1.1-py3-none-any.whl (897 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m897.5/897.5 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: sacremoses
Successfully installed sacremoses-0.1.1


In [13]:
# 번역 (Translation)
translator = pipeline('translation_en_to_de', model='Helsinki-NLP/opus-mt-en-de')
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt. Eingeschlossen sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, von Ihnen bald zu hören. Aufrichtig, Bumblebee.


In [14]:
def print_long_text(text, max_length=10):
    words = text.split(' ')
    for i, word in enumerate(words):
        if i % max_length == 0:
            print()
        print(word, end=' ')

In [15]:
print_long_text(outputs[0]['translation_text'])


Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime 
Action Figur aus Ihrem Online-Shop in Deutschland bestellt. Leider, als 
ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass 
ich stattdessen eine Action Figur von Megatron geschickt worden war! 
Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein 
Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen 
Austausch von Megatron für die Optimus Prime Figur habe ich 
bestellt. Eingeschlossen sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich 
erwarte, von Ihnen bald zu hören. Aufrichtig, Bumblebee. 

In [16]:
# 텍스트 생성 (Text Generation)
generator = pipeline('text-generation')
response = 'Dear Bumblebee, I am sorry to hear that your order was mixed up.'
prompt = text + '\n=============\nCustomer service response:\n' + response
outputs = generator(prompt, max_length=200)
print_long_text(outputs[0]['generated_text'])

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Dear Amazon, last week I ordered an Optimus Prime action 
figure from your
online store in Germany. Unfortunately, when I opened 
the package, I discovered to
my horror that I had been 
sent an action figure of Megatron instead! As  a 
lifelong
enemy of the Decepticons, I hope you can understand my 
dilemma. To resolve the
issue, I demand an exchange of Megatron 
for the Optimus Prime figure I ordered.
Enclosed are copies of 
my records concerning this purchase. I expect to hear from
you 
soon. Sincerely, Bumblebee.
Customer service response:
Dear Bumblebee, I am sorry to 
hear that your order was mixed up. It is as 
if I was told that I could not get the 
product to my
the shelves, or that it needed to be 
resold. You are correct. I was told that it would 
not be necessary to send
me the product back for new 
purchase at that time. The order 

> #### <b style="color: #f86461">허깅페이스 생태계</b>
> - 허브
> - 토크나이저
> - 데이터셋
> - 엑셀러레이트