# Working with Pipeline

Pipeline() function allows us to directly input any text and get respective answer.

In [3]:
!pip install transformers[sentencepiece]

Collecting transformers[sentencepiece]
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m18.5 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers[sentencepiece])
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers[sentencepiece])
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m56.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers[sentencepiece])
  Downloading safetensors-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [4]:
# importing pipeline function
import transformers
from transformers import pipeline

### Sentiment Analysis

In [5]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [6]:
classifier("2322 students have received the scholarship for class 11 in the first phase of the scholarship program conducted by kathmandu Metro.")

[{'label': 'POSITIVE', 'score': 0.9824532270431519}]

In [7]:
# passing several sentences.
classifier(
    ["Messi wins his first trophy in the USA as Inter Miami wins the Leagues Cup.",
     "382 Divorce cases have been registered in the baglung district in the last fiscal year."]
)

[{'label': 'POSITIVE', 'score': 0.9996962547302246},
 {'label': 'POSITIVE', 'score': 0.5399848818778992}]

### Zero-shot-Classification

In [8]:
# To specify text.
zsclf = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [9]:
text = ["2322 students have received the scholarship for class 11 in the first phase of the scholarship program conducted by kathmandu Metro.",
        "Messi wins his first trophy in the USA as Inter Miami wins the Leagues Cup.",
     "382 Divorce cases have been registered in the baglung district in the last fiscal year."]
zsclf(text,candidate_labels = ['education',"politics","statistics"])

[{'sequence': '2322 students have received the scholarship for class 11 in the first phase of the scholarship program conducted by kathmandu Metro.',
  'labels': ['education', 'statistics', 'politics'],
  'scores': [0.5448172688484192, 0.4070856273174286, 0.04809711501002312]},
 {'sequence': 'Messi wins his first trophy in the USA as Inter Miami wins the Leagues Cup.',
  'labels': ['statistics', 'politics', 'education'],
  'scores': [0.7865137457847595, 0.11511396616697311, 0.09837234020233154]},
 {'sequence': '382 Divorce cases have been registered in the baglung district in the last fiscal year.',
  'labels': ['statistics', 'politics', 'education'],
  'scores': [0.9647744297981262, 0.02256729081273079, 0.012658334337174892]}]

### Text Generation

In [10]:
gen = pipeline("text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [11]:
gen("Messi wins his first trophy in the USA as")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Messi wins his first trophy in the USA as a 16 year old.\n\nLazio wins his second career world ranking and he gets to have his first World Cup after being elected to USA national team in the tournament.\n\nTottenham'}]

### Text Generation - distilgpt2

In [13]:
dgen = pipeline("text-generation",model="distilgpt2")

Downloading (…)lve/main/config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [14]:
dgen("Messi wins his first trophy in the USA as",max_length = 30,num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Messi wins his first trophy in the USA as team-mate after just 17.4 years\n\n\n\n\n\n\n\n\n\n\n'},
 {'generated_text': 'Messi wins his first trophy in the USA as the second-best scorer in the entire country.'}]

### Fill-mask: Predict missing word

In [15]:
unmasker = pipeline("fill-mask")

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [17]:
unmasker("Messi wins his first <mask> in the USA.",top_k = 3)

[{'score': 0.1530022770166397,
  'token': 1270,
  'token_str': ' title',
  'sequence': 'Messi wins his first title in the USA.'},
 {'score': 0.13224737346172333,
  'token': 914,
  'token_str': ' match',
  'sequence': 'Messi wins his first match in the USA.'},
 {'score': 0.12616653740406036,
  'token': 2836,
  'token_str': ' championship',
  'sequence': 'Messi wins his first championship in the USA.'}]

### Ner pipeline: Indentifies entities

In [18]:
ner = pipeline("ner",grouped_entities = True)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]



In [19]:
ner("Messi wins his first trophy in the USA as a captain of Inter Miami FC.")

[{'entity_group': 'PER',
  'score': 0.9974834,
  'word': 'Messi',
  'start': 0,
  'end': 5},
 {'entity_group': 'LOC',
  'score': 0.99962044,
  'word': 'USA',
  'start': 35,
  'end': 38},
 {'entity_group': 'ORG',
  'score': 0.9979229,
  'word': 'Inter Miami FC',
  'start': 55,
  'end': 69}]

### Question Answering Pipeline

In [20]:
answerer = pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [21]:
answerer(
    question = "Where did He won?",
    context = "Messi wins his first trophy in the USA."
)

{'score': 0.8196228742599487, 'start': 35, 'end': 38, 'answer': 'USA'}

### Summarization

In [22]:
summarize = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [23]:
summarize("""The fire-ravaged Canadian province of British Columbia was under a state of emergency for a second day, as a wildfire in and around the resort city of Kelowna continued to consume houses.Firefighters said on Saturday that a drop in wind was aiding their efforts to control the blaze, but that the flames and embers continued to blow toward the city.

The fire is one of two in Canada that have led thousands to evacuate their homes in the last week. Hundreds of miles away from Kelowna, a wildfire converging on the city of Yellowknife, Northwest Territories, prompted officials to order a mass evacuation of the entire city.

Officials said Saturday that the fire remained stalled a few miles from Yellowknife, a welcome reprieve, though the danger remained serious and imminent.""")

[{'summary_text': ' British Columbia is under a state of emergency for a second day as a wildfire continues to consume homes . Firefighters say a drop in wind is helping efforts to control the blaze . The fire is one of two in Canada that have led thousands to evacuate their homes in the last week .'}]

### Machine Translation

In [24]:
translator = pipeline("translation", model ="Helsinki-NLP/opus-mt-fr-en")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading (…)olve/main/source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading (…)olve/main/target.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



In [25]:
translator("C'est quoi les Outre-mer?")

[{'translation_text': "What's overseas?"}]