# NLP with HuggingFaces

## Loading Libraries

In [2]:
import transformers
import pandas as pd
from transformers import pipeline

2023-11-22 20:29:38.272868: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Sentiment Analysis

In [3]:
from urllib.request import urlopen

# Read sample text, a poem
URL = "https://data.heatonresearch.com/data/t81-558/"\
    "datasets/sonnet_18.txt"
f = urlopen(URL)
text = f.read().decode("utf-8")

In [4]:
classifier = pipeline("text-classification")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [5]:
outputs = classifier(text)
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,POSITIVE,0.984666


## Entity Tagging
Take source text and find parts of text that represent entities e.g. person, location etc.

In [6]:
swift = "Taylor Swift is a famous singer from the United States"
tagger = pipeline("ner", aggregation_strategy = "simple")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [None]:
pd.DataFrame(tagger(swift))

Unnamed: 0,entity_group,score,word,start,end
0,PER,0.999482,Taylor Swift,0,12
1,LOC,0.999598,United States,41,54


## Question Answering

In [None]:
reader = pipeline("question-answering")
question = "What now shall fade?"

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [None]:
outputs = reader(question = question, context = text)
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.471141,414,428,eternal summer


## Language Translation

In [None]:
translator = pipeline("translation_en_to_de", model = "Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces = True, min_length = 100)
print(outputs[0]["translation_text"])

Sonnet 18 Originaltext William Shakespeare Soll ich dich mit einem Sommertag vergleichen? Du bist schöner und gemäßigter: Raue Winde schütteln die lieblichen Knospen des Mai, Und der Sommervertrag hat zu kurz ein Datum: Irgendwann zu heiß das Auge des Himmels leuchtet, Und oft ist sein Gold Teint dimm'd; Und jede faire von Fair irgendwann sinkt, Durch Zufall oder die Natur wechselnden Kurs untrimm'd; Aber dein ewiger Sommer wird nicht verblassen noch verlieren Besitz von dem Schönen du schuld; noch wird der Tod prahlen du wandert in seinem Schatten, Wenn in ewigen Linien zur Zeit wachsen: So lange die Menschen atmen oder Augen sehen können, So lange lebt dies und dies gibt dir Leben.


## Summarization

In [None]:
text2 = """
An apple is an edible fruit produced by an apple tree (Malus domestica). 
Apple trees are cultivated worldwide and are the most widely grown species 
in the genus Malus. The tree originated in Central Asia, where its wild 
ancestor, Malus sieversii, is still found today. Apples have been grown 
for thousands of years in Asia and Europe and were brought to North America 
by European colonists. Apples have religious and mythological significance 
in many cultures, including Norse, Greek, and European Christian tradition.
"""

In [None]:
summarizer = pipeline("summarization")
outputs = summarizer(text2, max_length = 45, clean_up_tokenization_spaces = True)
print(outputs[0]["summary_text"])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Your min_length=56 must be inferior than your max_length=45.


 An apple is an edible fruit produced by an apple tree (Malus domestica) Apple trees are cultivated worldwide and are the most widely grown species in the genus Malus. Apples have religious and mythological


## Text Generation

In [None]:
generator = pipeline("text-generation")
outputs = generator(text, max_length=400)
print(outputs[0]['generated_text'])

NameError: name 'pipeline' is not defined