# Pipeline function
A Transformers csomag pipeline nevű metódusával sok NLP feladat teljesíthető 1-2 sor megírásával. \
A pipeline() automatikusan választ egy alap modellt az adott feladathoz.
A következő feladatokat fogjuk megnézni:


*   Sequence classification (pl. sentiment-analysis)
*   Extractive Question Answering
*   Masked Language Modeling
*   Text Gerenation
*   NER (Named Entity Recognition)
*   Summarization
*   Translation









## Letöltések

In [None]:
!pip install transformers datasets

## Sequence Classification
A sentiment-analysis-sel aszerint soroljuk be a szöveget, hogy pozitív vagy negatív jelentésű-e. \
A result=classifier("I hate you")[0] sorban azért kell a [0], mert a classifier egy listát ad vissza, és a listának az első elemét kérjük le. \
Ez az elem egy dictionary, aminek van "label" és "score" kulcsa. \
Ha több mondatot adtunk volna neki, akkor a listában több dictionary szerepelne.

In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

result = classifier("I hate you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

result = classifier("I love you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

##Extractive Question Answering
A pipeline automatikusan meghív egy modellt, ami egy szövegből kinyeri a feltett kérdésre a választ. \
A Question Answering dataset: SQuAD \
A modell, amit használ, egy SQuAD-ra hangolt BERT.

In [None]:
from transformers import pipeline

question_answerer = pipeline("question-answering")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
"""
result = question_answerer(question="What is extractive question answering?", context=context)
print(
    f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
)

##Masked Language Modeling
A pretraining után a modellt finomhangoljuk egy bizonyos területre. Erre jó a Masked Language Modeling. \
Egy szövegben letakar egy tetszőleges szót, és a model feladata, hogy kitöltse egy odaillő szóval. \
A pprint olvashatóan írja ki a képernyőre az outputot, ami egy list. \
A list elemei dictionaryk, amik a különböző lehetséges szavakat tartalmazza, azoknak a score-ját, tokenjét, stb.

In [None]:
from transformers import pipeline
unmasker = pipeline("fill-mask")

from pprint import pprint
pprint(
    unmasker(
        f"HuggingFace is creating a {unmasker.tokenizer.mask_token} that the community uses to solve NLP tasks."
    )
)

##Text Generation
Ekkor a pipeline olyan modellt hív meg, ami folytatja a megkezdett mondatot. \
A max_length paraméterrel megadjuk, milyen hosszú legyen a generált szöveg. \
A do_sample azt szabályozza, hogy hogyan választja ki a modell a következő szót. Ha False, akkor mindig a legvalószínűbbet választja. Ha True, akkor nem mindig a legvalószínűbbet választja, így minden futásnál kicsit más lesz a mondat.

In [None]:
from transformers import pipeline

text_generator = pipeline("text-generation")
print(text_generator("As far as I am concerned, I will", max_length=50, do_sample=False))

## Named Entity Recognition
A NER felismeri, hogy a mondatban lévő szavak pl. helyszínek / személyek / cégek stb.

Ha megadjuk az aggregation_strategy="simple" paramétert, akkor nem bontja szét tokenekre az outputban.

In [None]:
from transformers import pipeline

ner_pipe = pipeline("ner", aggregation_strategy="simple")

sequence = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

for entity in ner_pipe(sequence):
    print(entity)

## Summarization
Egy hosszú szöveget foglal össze. \
A max_length és min_length paraméterrel megadjuk a szöveg minimális és maximális hosszát. A pipeline("summarization") függ a PreTrainedMode.generate() metódustól, és ott meglehet adni ezeket a paramétereket.

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")

ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""
print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))

##Translation

In [None]:
from transformers import pipeline

translator = pipeline("translation_en_to_fr")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))