# 트랜스포머로 무엇을 할 수 있나요?

* 인프런 강의 : https://inf.run/FX4TP
* 허깅페이스 한국어 튜토리얼 : https://huggingface.co/learn/nlp-course/ko/chapter1/1
* colab 이나 GPU가 지원되는 환경에서 실습을 권장합니다.

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [None]:
!pip install datasets evaluate transformers[sentencepiece]

## sentiment-analysis

In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

In [None]:
# pipeline?

In [None]:
classifier(
    ["I've been waiting for a HuggingFace course my whole life.",
     "I hate this so much!",
     "행복하다",
     "즐겁다",
     "힘들다"]
)

## zero-shot-classification
* [facebook/bart-large-mnli · Hugging Face](https://huggingface.co/facebook/bart-large-mnli)

In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

In [None]:
sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']
classifier(sequence_to_classify, candidate_labels)
#{'labels': ['travel', 'dancing', 'cooking'],
# 'scores': [0.9938651323318481, 0.0032737774308770895, 0.002861034357920289],
# 'sequence': 'one day I will see the world'}


In [None]:
candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
classifier(sequence_to_classify, candidate_labels, multi_label=True)
#{'labels': ['travel', 'exploration', 'dancing', 'cooking'],
# 'scores': [0.9945111274719238,
#  0.9383890628814697,
#  0.0057061901316046715,
#  0.0018193122232332826],
# 'sequence': 'one day I will see the world'}


## text-generation

* [gpt2 · Hugging Face](https://huggingface.co/gpt2)

In [None]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

* [skt/kogpt2-base-v2 · Hugging Face](https://huggingface.co/skt/kogpt2-base-v2)

In [None]:
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='skt/kogpt2-base-v2')
set_seed(42)
generator("점심 메뉴 추천,", max_length=30, num_return_sequences=3)

In [None]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=6,
)

## fill-mask

In [None]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=5)

* [klue/bert-base · Hugging Face](https://huggingface.co/klue/bert-base)

In [None]:
unmasker_klue = pipeline("fill-mask", model="klue/bert-base")
unmasker_klue("한국인이 좋아하는 음식은 [MASK] 입니다.", top_k=3)

## NER(Named Entity Recognition)

* [dbmdz/bert-large-cased-finetuned-conll03-english · Hugging Face](https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)

In [None]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

In [None]:
ner("안녕하세요. 오늘코드 유튜브 채널입니다. 여기는 대한민국입니다.")

In [None]:
ner("Hello, I'm Korean")

## question-answering

In [None]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at todaycode in seoul",
)

## summarization
* [sshleifer/distilbart-cnn-12-6 · Hugging Face](https://huggingface.co/sshleifer/distilbart-cnn-12-6)

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other
    industrial countries in Europe and Asia, continue to encourage and advance
    the teaching of engineering. Both China and India, respectively, graduate
    six and eight times as many traditional engineers as does the United States.
    Other industrial countries at minimum maintain their output, while America
    suffers an increasingly serious decline in the number of engineering graduates
    and a lack of well-educated engineers.
"""
)

* [gogamza/kobart-summarization · Hugging Face](https://huggingface.co/gogamza/kobart-summarization)

In [None]:
ko_summarizer = pipeline("summarization", model="gogamza/kobart-summarization")
ko_summarizer("상위 몇 개의 높은 확률을 띠는 토큰을 출력할지 top_k 인자를 통해 조절합니다. 여기서 모델이 특이한 <mask> 단어를 채우는 것을 주목하세요. 이를 마스크 토큰(mask token)이라고 부릅니다. 다른 마스크 채우기 모델들은 다른 형태의 마스크 토큰을 사용할 수 있기 때문에 다른 모델을 탐색할 때 항상 해당 모델의 마스크 단어가 무엇인지 확인해야 합니다. 위젯에서 사용되는 마스크 단어를 보고 이를 확인할 수 있습니다.")

## translation

In [None]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

In [None]:
ko_translator = pipeline("translation", model="Helsinki-NLP/opus-mt-ko-en")
ko_translator("안녕하세요.")