# Pipelines
This notebooks gives a short overview about transformer pipelines.

In [2]:
from transformers import pipeline

## Zero Shot Learning
In real world projects we need to classify texts that haven't been labelled.
But annotating is time consuming and always not be best appraoch.

`zero-shot-classification` :
- allows us to specify our own labels
- classifies text on these specified labels

In [4]:
# zero shot learning pipeline
zero_shot = pipeline("zero-shot-classification")

# Let's classify text based on our labels
zero_shot("I am studying how LLMs work. A LLM is a Large Language Model that is trained on vast amount of text data"
          , candidate_labels = ["computer" , "art" , "sport"])

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'sequence': 'I am studying how LLMs work. A LLM is a Large Language Model that is trained on vast amount of text data',
 'labels': ['computer', 'art', 'sport'],
 'scores': [0.8656079769134521, 0.07270870357751846, 0.06168331205844879]}

## Text Generation
`text-generation` pipeline generate text based on given prompt.
- provide prompt
- model will autocomplete the remaining part

**Arguments** :
- `num_return_sequences` : How many different sequences to generate
- `max_length` : total length of the output text

In [8]:
# text generation pipeline
text_gen = pipeline("text-generation")

# generate text
text_gen("Tell me about Japan" , num_return_sequences = 2)

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Explain me about Large Language Models in short \xa0short posts.\nThis is a really fun post, but I'm going to let you start reading this post and get excited about it.\nFirst off, I want to say this: I'm not making assumptions about how languages work. I'm not saying that languages don't have some kind of set of rules. I am not saying that your language is a set of rules. I'm just saying that using a set of languages helps me understand some of the concepts that you are seeing in your language.\nThe problem is, these ideas are very common. I can't teach you this stuff in my language, but I can tell you I do it, and that's what I want you to do.\nI hope you enjoyed this post. It's a post that I write about writing languages that can be more than just a set of rules. It's a post that I write to let you know that there is a place for you to learn more about the concepts that make your language so special.\nI hope you enjoyed it, and I hope that this post inspires you 

## Mask Filling
Works like fill-in-the-blanks.
Guess the word which can be used instead of `<mask>`

**Arguments** :
- `top_k` : how many possibilities to display

In [11]:
# mask filling pipeline
mask_filling = pipeline("fill-mask")

# This works as fill-in-the-blanks .
# Use <mask> to indicate blank space
mask_filling("Natural Language Processing (NLP) has <mask> the way we interact with technology, making communication more intuitive and efficient." , top_k = 2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.4017307162284851,
  'token': 11229,
  'token_str': ' transformed',
  'sequence': 'Natural Language Processing (NLP) has transformed the way we interact with technology, making communication more intuitive and efficient.'},
 {'score': 0.3558560013771057,
  'token': 1714,
  'token_str': ' changed',
  'sequence': 'Natural Language Processing (NLP) has changed the way we interact with technology, making communication more intuitive and efficient.'}]

In [18]:
# NER pipeline
ner = pipeline("ner" , grouped_entities = True)

# Let's find Named Entities in a sentence
ner("OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'entity_group': 'ORG',
  'score': np.float32(0.9967308),
  'word': 'OpenAI, Inc',
  'start': 0,
  'end': 11},
 {'entity_group': 'MISC',
  'score': np.float32(0.99758005),
  'word': 'American',
  'start': 19,
  'end': 27},
 {'entity_group': 'MISC',
  'score': np.float32(0.53491396),
  'word': 'AI',
  'start': 53,
  'end': 55},
 {'entity_group': 'LOC',
  'score': np.float32(0.9982227),
  'word': 'San Francisco',
  'start': 116,
  'end': 129},
 {'entity_group': 'LOC',
  'score': np.float32(0.9987657),
  'word': 'California',
  'start': 131,
  'end': 141}]

In [19]:
# question-answering pipeline
qa = pipeline("question-answering")

# Let's ask a question based on the context
# We need to provide question and contex
qa(question = "When was OpenAI founded ?"
   ,context = "OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California.")

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'score': 0.9552261233329773,
 'start': 81,
 'end': 94,
 'answer': 'December 2015'}

In [20]:
#summarization pipeline
summary = pipeline("summarization")

# generate summary of passage
summary("""

NLP enables computers and digital devices to recognize, understand and generate text and speech by combining computational linguistics, the rule-based modeling of human language together with statistical modeling, machine learning and deep learning.

NLP research has helped enable the era of generative AI, from the communication skills of large language models (LLMs) to the ability of image generation models to understand requests. NLP is already part of everyday life for many, powering search engines, prompting chatbots for customer service with spoken commands, voice-operated GPS systems and question-answering digital assistants on smartphones such as Amazon’s Alexa, Apple’s Siri and Microsoft’s Cortana.

NLP also plays a growing role in enterprise solutions that help streamline and automate business operations, increase employee productivity and simplify business processes.
""")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'summary_text': ' NLP research has helped enable the era of generative AI, from the communication skills of large language models to the ability of image generation models to understand requests . NLP is already part of everyday life for many, powering search engines, prompting chatbots for customer service with spoken commands .'}]

In [21]:
# translation pipeline
translation = pipeline("translation_en_to_fr")

translation("I am studying how LLMs work. A LLM is a Large Language Model that is trained on vast amount of text data")

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'translation_text': 'Je suis en train d’étudier comment fonctionnent les LLM. Un LLM est un modèle langagier de grande envergure qui est formé sur une vaste quantité de données textuelles.'}]