# Hugging Face

## 이선우 (20223888)

In [2]:
from transformers import pipeline

## Sentiment Analysis

In [3]:
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


In [4]:
text = [
    "Fly me the moon, and let me play among the stars",
    "April is the cruellest month, breeding Lilacs out of the dead land"
]

In [5]:
classifier(text)

[{'label': 'POSITIVE', 'score': 0.9996002316474915},
 {'label': 'NEGATIVE', 'score': 0.9603950381278992}]

## Zero-shot classification

(Few-shot classification)

GPT : Generative Pre-Trained language model

In [6]:
classifier = pipeline('zero-shot-classification')

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


In [7]:
text = [
    "German finance minister urges EU to rein in public spending",
    "China seeks more island security pacts to boost clout in Pacific"
]

In [8]:
classifier(
    text,
    candidate_labels=[
        'education', 'politics', 'business',
        'economy', 'europe', 'asia'
    ]
)

[{'sequence': 'German finance minister urges EU to rein in public spending',
  'labels': ['europe', 'politics', 'economy', 'business', 'education', 'asia'],
  'scores': [0.40189388394355774,
   0.2552807033061981,
   0.24059978127479553,
   0.07709541916847229,
   0.016165118664503098,
   0.008965007029473782]},
 {'sequence': 'China seeks more island security pacts to boost clout in Pacific',
  'labels': ['politics', 'asia', 'business', 'economy', 'europe', 'education'],
  'scores': [0.5034076571464539,
   0.27671101689338684,
   0.14557312428951263,
   0.033448345959186554,
   0.02069881558418274,
   0.020161081105470657]}]

## Text Generation

In [9]:
generator = pipeline('text-generation')

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


In [10]:
text = "There is but one truly serious philosophical problem, and that is suicide."

In [11]:
generator(text, max_lenght=256)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'There is but one truly serious philosophical problem, and that is suicide.\n\nA man might kill himself, and a man might not or really want to go to the doctor to try suicide. And so, while a man may have a mental illness'}]

## Mask filling

In [12]:
unmasker = pipeline('fill-mask')

No model was supplied, defaulted to distilroberta-base (https://huggingface.co/distilroberta-base)


In [13]:
# Billie Eilish
text = "So you're a <mask> guy, Like it really rough guy"

In [14]:
unmasker(text)

[{'score': 0.9274505972862244,
  'token': 6744,
  'token_str': ' rough',
  'sequence': "So you're a rough guy, Like it really rough guy"},
 {'score': 0.030521560460329056,
  'token': 1828,
  'token_str': ' tough',
  'sequence': "So you're a tough guy, Like it really rough guy"},
 {'score': 0.001765676774084568,
  'token': 1099,
  'token_str': ' bad',
  'sequence': "So you're a bad guy, Like it really rough guy"},
 {'score': 0.001683201640844345,
  'token': 15455,
  'token_str': ' nasty',
  'sequence': "So you're a nasty guy, Like it really rough guy"},
 {'score': 0.0015438725240528584,
  'token': 543,
  'token_str': ' hard',
  'sequence': "So you're a hard guy, Like it really rough guy"}]

## NER (Named Entity Recognition)

In [15]:
ner = pipeline('ner')

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


In [16]:
text = "Steven Paul Jobs (February 24, 1955 - October 5, 2011) was an American entrepreneur, inventor, business magnate, media proprietor, and investor."

In [17]:
ner(text)

[{'entity': 'I-PER',
  'score': 0.9995907,
  'index': 1,
  'word': 'Steven',
  'start': 0,
  'end': 6},
 {'entity': 'I-PER',
  'score': 0.999574,
  'index': 2,
  'word': 'Paul',
  'start': 7,
  'end': 11},
 {'entity': 'I-PER',
  'score': 0.9996289,
  'index': 3,
  'word': 'Job',
  'start': 12,
  'end': 15},
 {'entity': 'I-PER',
  'score': 0.9981768,
  'index': 4,
  'word': '##s',
  'start': 15,
  'end': 16},
 {'entity': 'I-MISC',
  'score': 0.99800843,
  'index': 18,
  'word': 'American',
  'start': 62,
  'end': 70}]

## Question answering

In [18]:
question_answerer = pipeline('question-answering')

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


In [19]:
question_answerer(
    context = text,
    question = "Which companies are founded by Steve Jobs?"
)

{'score': 0.6043979525566101,
 'start': 113,
 'end': 143,
 'answer': 'media proprietor, and investor'}

## Summarization

In [20]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


In [21]:
summarizer(text, max_length=64)

Your max_length is set to 64, but you input_length is only 34. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=17)


[{'summary_text': ' Steven Paul Jobs (February 24, 1955 - October 5, 2011) was an American entrepreneur, inventor, business magnate, media proprietor, and investor . Jobs died in 2011 at age 55 . Jobs was an inventor, entrepreneur, entrepreneur and media mogul . Jobs is credited with inventing and inventing'}]

## Translation

brew install cmake
!pip install sentencepiece

In [22]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")

Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/784k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.28M [00:00<?, ?B/s]

In [23]:
translator("Hello Jieun")

[{'translation_text': 'Bonjour Jieun'}]

In [24]:
translator(text)

[{'translation_text': 'Steven Paul Jobs (24 février 1955 - 5 octobre 2011) était un entrepreneur américain, inventeur, magnat des affaires, propriétaire de médias et investisseur.'}]

In [25]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-ko-en")

Downloading:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/298M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/822k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/794k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.64M [00:00<?, ?B/s]

In [26]:
text = "한국산 가상화폐 루나와 테라USD(UST) 폭락으로 손실을 본 투자자들이 발행사 테라폼랩스의 권도형 최고경영자(CEO)를 고소했다."

In [27]:
translator(text)

[{'translation_text': "After losing a Korean virtual currency, Luna turusD (UST), investors filed charges against CEO's high-powered top manager for the launch service terafos."}]

## Sentiment Analysis - Korean

In [28]:
classifier = pipeline('sentiment-analysis', model='snunlp/KR-FinBert-SC')

Downloading:   0%|          | 0.00/881 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/387M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/372 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/140k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/287k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [29]:
text = [
    "한국산 가상화폐 루나와 테라USD(UST) 폭락으로 손실을 본 투자자들이 발행사 테라폼랩스의 권도형 최고경영자(CEO)를 고소했다.",
    "외국인, 올해 국내 주식 15조 원 순매도…삼성만 5조 원 팔았다",
    '尹, 탈원전 정상화 추진 “원전 수출 증진 위해 韓美 노력”',
]

In [30]:
classifier(text)

[{'label': 'negative', 'score': 0.9798453450202942},
 {'label': 'negative', 'score': 0.9699411988258362},
 {'label': 'positive', 'score': 0.995445728302002}]