# HF Transformers Pipeline

In [1]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


## Sentimental Analysis

In [2]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


Here, we can see that the model used is `distilbert-base-uncased-finetuned-sst-2-english`

Lets use a IMDB review for the moview Borat

In [3]:
review = '''This movie is not for the faint of heart or the easily offended. 
It is a brilliant and scathing critique of the American society and culture, disguised as a comedy.
The movie exposes the hypocrisy, greed, violence, racism, and ignorance that plague the nation, through the absurd and hilarious adventures of Borat Sagdiyev, a Kazakh journalist who travels to the US to make a documentary. 
The movie is full of shocking and outrageous scenes that will make you laugh, cringe, and question your own beliefs and values. 
The movie is not meant to be taken literally or seriously, but rather as a mirror that reflects the ugly truth about ourselves. 
The movie is a masterpiece of irony and satire, and one of the most original and daring comedies ever made.'''

In [4]:
classifier(review)

[{'label': 'POSITIVE', 'score': 0.9995748400688171}]

Lets use a neutral review and see the models classification 

In [5]:
neutral_review = '''
"A Rollercoaster of Emotions" is a visually stunning masterpiece that takes audiences on a thrilling and captivating journey. 
The breathtaking cinematography and impeccable special effects create a mesmerizing world that pulls you right into the heart of the story. 
The performances by the lead actors are commendable, with moments of raw and intense emotion that leave a lasting impact.

However, the film's pacing at times feels uneven, with certain scenes dragging on a bit too long while others rush by, leaving you slightly disoriented. 
The intricate plot, while intriguing, can also become convoluted, requiring the audience to stay fully engaged to grasp all the nuances. 
Despite these minor drawbacks, "A Rollercoaster of Emotions" is an experience worth embarking upon for its highs that genuinely touch the soul, even though it occasionally loses its way on the journey.
'''

In [6]:
classifier(neutral_review)

[{'label': 'POSITIVE', 'score': 0.9997828602790833}]

The above review is more of neutral then also the model marked it as positive with higher probability

In [7]:
classifier = pipeline("sentiment-analysis", model='hipnologo/gpt2-imdb-finetune')

In [8]:
classifier(neutral_review)

[{'label': 'LABEL_1', 'score': 0.9971871972084045}]

The `hipnologo/gpt2-imdb-finetune` model also classifies the above neutral text as positive with more probability

## Zero Shot Classification

In [9]:
zeroshot_classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [10]:
candidate_text = '''Amidst a fierce political debate, a last-minute goal secured victory in a sports showdown. 
Under the starlit sky, a new species emerged from the ocean's depths, as whispers of a forgotten melody lingered. 
Tokyo's vibrant streets witnessed a romantic encounter, while groundbreaking AI reshaped technology. 
In the Renaissance era, artists created masterpieces rivaling historic tales, and a decadent chocolate cake offered a symphony of flavors, 
a true culinary masterpiece'''
candidate_labels = ["Sports", "Science", "Romance", "Political", "Travel", "Technology", "Horror", "Culinary", "History", "Finance"]

In [11]:
zeroshot_classifier(candidate_text, candidate_labels=candidate_labels)

{'sequence': "Amidst a fierce political debate, a last-minute goal secured victory in a sports showdown. \nUnder the starlit sky, a new species emerged from the ocean's depths, as whispers of a forgotten melody lingered. \nTokyo's vibrant streets witnessed a romantic encounter, while groundbreaking AI reshaped technology. \nIn the Renaissance era, artists created masterpieces rivaling historic tales, and a decadent chocolate cake offered a symphony of flavors, \na true culinary masterpiece",
 'labels': ['Sports',
  'Political',
  'Romance',
  'Culinary',
  'Travel',
  'History',
  'Technology',
  'Science',
  'Finance',
  'Horror'],
 'scores': [0.17671708762645721,
  0.16522802412509918,
  0.1507391482591629,
  0.13005705177783966,
  0.1155952736735344,
  0.09281495958566666,
  0.07342732697725296,
  0.04154600948095322,
  0.034687407314777374,
  0.01918768137693405]}

## Text Generation

In [12]:
generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [22]:
generator("I wondered lonely as a cloud", num_return_sequences=2, max_length=30)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'I wondered lonely as a cloud over me for a moment."\n\nI reached out to her and lifted her hand.\n\n"Let me think'},
 {'generated_text': "I wondered lonely as a cloud-haired wizard on my way to your city. You know I'd be here, waiting for you. This is my"}]

## Text Generation with Bengali Language

In [14]:
gen = pipeline('text-generation', model='ritog/bangla-gpt2')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [15]:
gen('একা আমি ফিরব না আর')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'একা আমি ফিরব না আর হোসেন’জেমসের জন্য কাঁদলেন নোবেলজয়ী, পরিবারে শোকের ছায়া'}]

## Mask Filing

In [16]:
unmasker = pipeline('fill-mask')

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [17]:
unmasker('I walk a <mask> road the only one that I have ever known', top_k=3)

[{'score': 0.24250896275043488,
  'token': 20100,
  'token_str': ' lonely',
  'sequence': 'I walk a lonely road the only one that I have ever known'},
 {'score': 0.22409918904304504,
  'token': 10667,
  'token_str': ' dirt',
  'sequence': 'I walk a dirt road the only one that I have ever known'},
 {'score': 0.06567858159542084,
  'token': 26923,
  'token_str': ' gravel',
  'sequence': 'I walk a gravel road the only one that I have ever known'}]

In [18]:
unmasker = pipeline('fill-mask', model='bert-base-cased')

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [19]:
unmasker('[MASK] roads diverged in a yellow wood', top_k=3)

[{'score': 0.7492600679397583,
  'token': 1109,
  'token_str': 'The',
  'sequence': 'The roads diverged in a yellow wood'},
 {'score': 0.032785188406705856,
  'token': 2695,
  'token_str': 'Both',
  'sequence': 'Both roads diverged in a yellow wood'},
 {'score': 0.02911911904811859,
  'token': 1103,
  'token_str': 'the',
  'sequence': 'the roads diverged in a yellow wood'}]

Lets add the sentence `, And sorry I could not travel both` and see the output. With the added sentence I am giving more hint to the model that are two roads

In [20]:
unmasker('[MASK] roads diverged in a yellow wood, And sorry I could not travel both', top_k=3)

[{'score': 0.6050049066543579,
  'token': 1109,
  'token_str': 'The',
  'sequence': 'The roads diverged in a yellow wood, And sorry I could not travel both'},
 {'score': 0.13009047508239746,
  'token': 3458,
  'token_str': 'Our',
  'sequence': 'Our roads diverged in a yellow wood, And sorry I could not travel both'},
 {'score': 0.02842896804213524,
  'token': 1960,
  'token_str': 'Two',
  'sequence': 'Two roads diverged in a yellow wood, And sorry I could not travel both'}]

The model guessed it as `Two` in 3rd prediction

## NER (Named Entity Recognition)

Entities are 
1. Name 
2. Place
3. Organization

In [32]:
ner = pipeline('ner', grouped_entities=True)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [33]:
ner('''After a long and wretched flight
That stretched from daylight into night,
Where babies wept and tempers shattered
And the plane lurched and whiskey splattered
Over my plastic food, I came
To claim my bags from Baggage Claim

Around, the carousel went around
The anxious travelers sought and found
Their bags, intact or gently battered,
But to my foolish eyes what mattered
Was a brave suitcase, red and small,
That circled round, not mine at all.

I knew that bag. It must be hers.
We hadnt met in seven years!
And as the metal plates squealed and clattered
My happy memories chimed and chattered.
An old man pulled it of the Claim.
My bags appeared: I did the same. ''')

[]

No entities returned as there are no entities belonging to the `name`, `place` and `organization` in the above poem written by Vikram Seth. Lets try a sentence with name & place in it

In [34]:
ner('The above poem is written by Vikram Seth, who was born in Kolkata')

[{'entity_group': 'PER',
  'score': 0.9995754,
  'word': 'Vikram Seth',
  'start': 29,
  'end': 40},
 {'entity_group': 'LOC',
  'score': 0.9977036,
  'word': 'Kolkata',
  'start': 58,
  'end': 65}]

Below is the output with out grouping the entities

In [40]:
ner = pipeline('ner', grouped_entities=False)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [41]:
ner('The above poem is written by Vikram Seth, who was born in Kolkata')

[{'entity': 'I-PER',
  'score': 0.99954695,
  'index': 7,
  'word': 'V',
  'start': 29,
  'end': 30},
 {'entity': 'I-PER',
  'score': 0.9992286,
  'index': 8,
  'word': '##ik',
  'start': 30,
  'end': 32},
 {'entity': 'I-PER',
  'score': 0.9996859,
  'index': 9,
  'word': '##ram',
  'start': 32,
  'end': 35},
 {'entity': 'I-PER',
  'score': 0.9998404,
  'index': 10,
  'word': 'Seth',
  'start': 36,
  'end': 40},
 {'entity': 'I-LOC',
  'score': 0.9977036,
  'index': 16,
  'word': 'Kolkata',
  'start': 58,
  'end': 65}]

## Part of Speech Tagging
>  Assigns grammatical categories (like nouns, verbs, etc.) to words in a sentence

In [56]:
pos = pipeline("token-classification", model="vblagoje/bert-english-uncased-finetuned-pos", grouped_entities=True)

Some weights of the model checkpoint at vblagoje/bert-english-uncased-finetuned-pos were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [57]:
pos('''My name is Anubhav and I am in Seattle''')

[{'entity_group': 'PRON',
  'score': 0.9994246,
  'word': 'my',
  'start': 0,
  'end': 2},
 {'entity_group': 'NOUN',
  'score': 0.9972132,
  'word': 'name',
  'start': 3,
  'end': 7},
 {'entity_group': 'AUX',
  'score': 0.9963749,
  'word': 'is',
  'start': 8,
  'end': 10},
 {'entity_group': 'PROPN',
  'score': 0.99478185,
  'word': 'anubhav',
  'start': 11,
  'end': 18},
 {'entity_group': 'CCONJ',
  'score': 0.9991928,
  'word': 'and',
  'start': 19,
  'end': 22},
 {'entity_group': 'PRON',
  'score': 0.99943525,
  'word': 'i',
  'start': 23,
  'end': 24},
 {'entity_group': 'AUX',
  'score': 0.99505144,
  'word': 'am',
  'start': 25,
  'end': 27},
 {'entity_group': 'ADP',
  'score': 0.99937147,
  'word': 'in',
  'start': 28,
  'end': 30},
 {'entity_group': 'PROPN',
  'score': 0.99885774,
  'word': 'seattle',
  'start': 31,
  'end': 38}]

In [61]:
pos('''With no companion to my mood,
Against the wind as it should be,
I walk, but in my solitude
Bow to the wind that buffets me.''')

[{'entity_group': 'ADP',
  'score': 0.9977035,
  'word': 'with',
  'start': 0,
  'end': 4},
 {'entity_group': 'DET',
  'score': 0.9994331,
  'word': 'no',
  'start': 5,
  'end': 7},
 {'entity_group': 'NOUN',
  'score': 0.9990916,
  'word': 'companion',
  'start': 8,
  'end': 17},
 {'entity_group': 'ADP',
  'score': 0.9994623,
  'word': 'to',
  'start': 18,
  'end': 20},
 {'entity_group': 'PRON',
  'score': 0.9995864,
  'word': 'my',
  'start': 21,
  'end': 23},
 {'entity_group': 'NOUN',
  'score': 0.99918777,
  'word': 'mood',
  'start': 24,
  'end': 28},
 {'entity_group': 'PUNCT',
  'score': 0.9996674,
  'word': ',',
  'start': 28,
  'end': 29},
 {'entity_group': 'ADP',
  'score': 0.99920064,
  'word': 'against',
  'start': 30,
  'end': 37},
 {'entity_group': 'DET',
  'score': 0.99949336,
  'word': 'the',
  'start': 38,
  'end': 41},
 {'entity_group': 'NOUN',
  'score': 0.9991099,
  'word': 'wind',
  'start': 42,
  'end': 46},
 {'entity_group': 'SCONJ',
  'score': 0.9976913,
  'word':

##  Question answering
> Extracts information from the given context, but does not generate answer

In [62]:
question_answerer = pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 473/473 [00:00<00:00, 45.3kB/s]
Downloading model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 261M/261M [00:09<00:00, 27.1MB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

In [64]:
question_answerer(question="When do I walk?", context='''
What can I say to you? How can I retract
All that that fool my voice has spoken -
Now that the facts are plain, the placid surface cracked,
The protocols of friendship broken?
I cannot walk by day as now I walk at dawn
Past the still house where you lie sleeping.
May the sun burn these footprints on the lawn
And hold you in its warmth and keeping.''')

{'score': 0.563761830329895, 'start': 215, 'end': 219, 'answer': 'dawn'}

## Summarization

In [65]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.80k/1.80k [00:00<00:00, 135kB/s]
Downloading pytorch_model.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.22G/1.22G [00:46<00:00, 26.3MB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

In [69]:
summarizer('''
In the multidimensional labyrinth of quantum mechanics, 
where particles can exist in a superposition of states and become entangled across vast distances instantaneously, 
the very act of observation wields the power to collapse wavefunctions and determine measurable outcomes, 
challenging our intuitive understanding of reality. 
At the heart of this enigma lies the Schrödinger's cat paradox, 
an illustrative gedankenexperiment in which a feline entity exists simultaneously 
in a bewildering duality of being both alive and dead until observed, 
exposing the profound role of consciousness and measurement in the intricate tapestry of the quantum realm. 
The burgeoning field of quantum computing, 
leveraging the intricate dance of qubits entangled through delicate quantum gates, 
promises to revolutionize computation by harnessing the inherent parallelism and uncertainty of quantum states, 
potentially unraveling solutions to problems that were hitherto computationally intractable, 
and yet, it grapples with the imperfections of decoherence and error correction, 
mirroring the intricate dance of probability that underpins the very fabric of the quantum universe.
''')

[{'summary_text': " In the Schrödinger's cat paradox, a feline entity exists simultaneously, in a bewildering duality of being both alive and dead until observed . The burgeoning field of quantum computing, leveraging the intricate dance of qubits entangled through delicate quantum gates, \xa0promises to revolutionize computation ."}]

## Translation

In [73]:
translator = pipeline('translation', model='salesken/translation-hi-en')

Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.33k/1.33k [00:00<00:00, 78.3kB/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 302M/302M [00:11<00:00, 25.9MB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 235/235 [00:00<00:00, 24.2kB/s]
Downloading (…)olve/main/source.spm: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

In [74]:
translator('''
अब तक़ रही थी खुशियों की मिठास,
पर आज आयी ये अजनबी उदासी।
दिल में बसी है एक अजीब सी बेचैनी,
बिना किसी वजह की, बिना किसी रास्ते।

आसमान पर छाया है गहरा अँधेरा,
जैसे मन में बसी है कोई गुहा।
खो गई है सब वो हँसी की बुनाई,
बन गए हैं तन्हाई के जाल में फसे।

यादें लायी हैं ये दर्द की बूँदें,
आँखों से बह रही हैं बेचैन आहें।
मन में छाई है गहरी उदासी की रात,
जैसे खो गया हो किसी सपने का सारा सफर।
''')

[{'translation_text': 'The sweetness of happiness was so far, but this stranger came to the fore. The heart is in a strange state, without any reason. The sky is full of darkness, like a dark cry.'}]

Taking the above text and passing to english to hindi text

In [75]:
translator = pipeline('translation', model='Helsinki-NLP/opus-mt-en-hi')

Downloading (…)lve/main/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.39k/1.39k [00:00<00:00, 144kB/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 306M/306M [00:11<00:00, 26.8MB/s]
Downloading (…)neration_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 293/293 [00:00<00:00, 26.4kB/s]
Downloading (…)okenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

In [76]:
translator('''
The sweetness of happiness was so far, but this stranger came to the fore. The heart is in a strange state, without any reason. The sky is full of darkness, like a dark cry.
''')

[{'translation_text': 'और सुख का मीठा हाल तो दूर था, परन्तु वह परदेशी देखने को आया था, परन्तु मन पराया हुआ है; किसी कारण से नहीं, आकाश अन्धकार से भरा हुआ है, और घोर अन्धकार से भरा हुआ है।'}]