In [1]:
!pip install transformers



In [2]:
import transformers

In [3]:
!pip install transformers[sentencepiece]



# Transformer Models
## Tool 1: pipeline() function
The most basic object in the Transformers library is the pipeline() function.<br>
It connects the model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer.

In [4]:
from transformers import pipeline

In [5]:
#SENTIMENT ANALYSIS
classifier = pipeline("sentiment-analysis")#This pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. Downloaded and cached when the classifier object is created.
classifier("I've been waiting for HuggingFace course my whole life")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9861214756965637}]

In [6]:
classifier("I would love to go to South Korea, but I can't eat sea food as I'm vegetarian.")

[{'label': 'NEGATIVE', 'score': 0.9685031771659851}]

In [7]:
classifier(["I am going to home tomorrow."," My studies will get affected due to holidays."])

[{'label': 'POSITIVE', 'score': 0.9996745586395264},
 {'label': 'NEGATIVE', 'score': 0.9909992218017578}]

There are three main steps involved when some text is passed to a pipeline:
<ol type=1>
<li>The text is PREPROCESSED into a formated the model can understand.</li>
<li>The preprocessed inputs are passed to the model.</li>
<li>The predictions of the model are POST-PROCESSED, so you can make sense of them.</li>
</ol>

### Currently available pipelines along with code are shown below:

In [8]:
#ZERO-SHOT CLASSIFICATION
#It allows you to specify which labels to use for the classification, so you don't have to rely on the labels of the pretrained model.
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier("This is a course about the Transformers library", candidate_labels=["education", "politics", "business"],)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445989489555359, 0.11197412759065628, 0.04342695698142052]}

In [9]:
classifier("\"Students from all across the country come to Kota for education purposes,\” the prime minister said. “Congress party has destroyed the dreams of the youth repeatedly \
in the last five years. Congress sold papers for all exams. I want to assure you the one involved in the paper leak, will be sent behind bars. This is the guarantee of Modi,\" \
he added.", candidate_labels=["education", "politics", "business", "person","location"])

{'sequence': '"Students from all across the country come to Kota for education purposes,\\” the prime minister said. “Congress party has destroyed the dreams of the youth repeatedly in the last five years. Congress sold papers for all exams. I want to assure you the one involved in the paper leak, will be sent behind bars. This is the guarantee of Modi," he added.',
 'labels': ['person', 'location', 'education', 'politics', 'business'],
 'scores': [0.3720516860485077,
  0.31952568888664246,
  0.15878579020500183,
  0.11027300357818604,
  0.03936377167701721]}

In [10]:
##TEXT GENERATION
#The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text.
generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to make an application of the design by simply showing the design files to a human in one click as well as creating the layout.\n\nA second course will concentrate on the Design Patterns and is dedicated to'}]

In [11]:
generator("In Lucknow, there is a place called Husainabad. It's quite spooky", max_length = 50, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "In Lucknow, there is a place called Husainabad. It's quite spooky in its look. The houses have houses. Houses are not that bad. The streets are not that bad. Sometimes it starts with a small house here. It"},
 {'generated_text': "In Lucknow, there is a place called Husainabad. It's quite spooky, but as I looked around the place I felt my soul return. It was just the same… But there was a story there.\n\nThe story."},
 {'generated_text': "In Lucknow, there is a place called Husainabad. It's quite spooky. My wife and I get to meet strange people all the time. Even some days we have a new friend. I don't know why they become so big"},
 {'generated_text': 'In Lucknow, there is a place called Husainabad. It\'s quite spooky in there". "I think it is the city of Hyderabad where the ghostly ghosts of India have lived as a couple. It\'s a really nice place'},
 {'generated_text': "In Lucknow, there is a place called Husainabad. It's quite spooky.\n\nAnd then there is the Jam

In [12]:
generator = pipeline("text-generation", model="distilgpt2")
generator("In this course, we will teach you how to", max_length=30, num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to get started with your online learning habits. The courses come with a new system for setting up and implementing'},
 {'generated_text': 'In this course, we will teach you how to create a framework to create a framework that takes action and will help you to create an amazing and revolutionary'}]

In [13]:
##MASK FILLING
# to fill in the blanks in a given text
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models", top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.19631513953208923,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models'},
 {'score': 0.04449228197336197,
  'token': 745,
  'token_str': ' building',
  'sequence': 'This course will teach you all about building models'}]

In [14]:
unmasker = pipeline('fill-mask', model='bert-base-cased')
unmasker("Hello I'm a [MASK] model.")

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.09019182622432709,
  'token': 4633,
  'token_str': 'fashion',
  'sequence': "Hello I'm a fashion model."},
 {'score': 0.0635000690817833,
  'token': 1207,
  'token_str': 'new',
  'sequence': "Hello I'm a new model."},
 {'score': 0.06228196248412132,
  'token': 2581,
  'token_str': 'male',
  'sequence': "Hello I'm a male model."},
 {'score': 0.0441727377474308,
  'token': 1848,
  'token_str': 'professional',
  'sequence': "Hello I'm a professional model."},
 {'score': 0.03326152265071869,
  'token': 7688,
  'token_str': 'super',
  'sequence': "Hello I'm a super model."}]

In [15]:
##NAMED ENTITY RECOGNITION
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Ajanta and I am a PhD. Scholar in CSE Department at IIT Guwahati.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.9910681,
  'word': 'Ajanta',
  'start': 11,
  'end': 17},
 {'entity_group': 'ORG',
  'score': 0.9093222,
  'word': 'CSE Department',
  'start': 45,
  'end': 59},
 {'entity_group': 'ORG',
  'score': 0.89570934,
  'word': 'IIT',
  'start': 63,
  'end': 66},
 {'entity_group': 'LOC',
  'score': 0.9329303,
  'word': 'Guwahati',
  'start': 67,
  'end': 75}]

In [16]:
##QUESTION-ANSWERING
# To answer questions using information from a given context
from transformers import pipeline
question_answer = pipeline("question-answering")
question_answer(question = "Where do I work?", context = "My name is Ajanta and I word at IIT Guwahati.")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.9475564360618591, 'start': 32, 'end': 44, 'answer': 'IIT Guwahati'}

In [17]:
##SUMMARIZATION
#task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text.
from transformers import pipeline
summarizer = pipeline("summarization")
summarizer("""The Ganges (/ˈɡændʒiːz/ GAN-jeez; in India: Ganga, /ˈɡʌŋɡɑː/ GUNG-ah; in Bangladesh: Padma, /ˈpʌdmə/ PUD-mə)[5][6][7][8] is a trans-boundary river of Asia which flows through India and Bangladesh. The 2,525 km (1,569 mi) river rises in the western Himalayas in the Indian state of Uttarakhand. It flows south and east through the Gangetic plain of North India, receiving the right-bank tributary, the Yamuna, which also rises in the western Indian Himalayas, and several left-bank tributaries from Nepal that account for the bulk of its flow.[9][10] In West Bengal state, India, a feeder canal taking off from its right bank diverts 50% of its flow southwards, artificially connecting it to the Hooghly River. The Ganges continues into Bangladesh, its name changing to the Padma. It is then joined by the Jamuna, the lower stream of the Brahmaputra, and eventually the Meghna, forming the major estuary of the Ganges Delta, and emptying into the Bay of Bengal. The Ganges–Brahmaputra–Meghna system is the second-largest river on earth by discharge.[11][12]

The main stem of the Ganges begins at the town of Devprayag,[1] at the confluence of the Alaknanda, which is the source stream in hydrology on account of its greater length, and the Bhagirathi, which is considered the source stream in Hindu mythology.

The Ganges is a lifeline to millions of people who live in its basin and depend on it for their daily needs.[13] It has been important historically, with many former provincial or imperial capitals such as Pataliputra,[14] Kannauj,[14] Sonargaon, Dhaka, Bikrampur, Kara, Munger, Kashi, Patna, Hajipur, Delhi, Bhagalpur, Murshidabad, Baharampur, Kampilya, and Kolkata located on its banks or the banks of tributaries and connected waterways. The river is home to approximately 140 species of fish, 90 species of amphibians, and also reptiles and mammals, including critically endangered species such as the gharial and South Asian river dolphin.[15] The Ganges is the most sacred river to Hindus.[16] It is worshipped as the goddess Ganga in Hinduism.[17]

The Ganges is threatened by severe pollution. This poses a danger not only to humans but also to animals. The levels of fecal coliform bacteria from human waste in the river near Varanasi are more than a hundred times the Indian government's official limit.[15] The Ganga Action Plan, an environmental initiative to clean up the river, has been considered a failure[a][b][18] which is variously attributed to corruption, a lack of will in the government, poor technical expertise,[c] poor environmental planning[d] and a lack of support from religious authorities.[e] """)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' The Ganges is a trans-boundary river of Asia which flows through India and Bangladesh . The 2,525 km (1,569 mi) river rises in the western Himalayas in the Indian state of Uttarakhand . It flows south and east through the Gangetic plain of North India . It is then joined by the Jamuna, the lower stream of the Brahmaputra, and eventually the Meghna, forming the major estuary of the Ganges Delta, and emptying into the Bay of Bengal . The river is home to approximately 140 species of fish, 90 species of amphibians, and also reptiles and mammals, including critically endangered species such as the ghar'}]

In [18]:
##TRANSLATION
from transformers import pipeline
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



[{'translation_text': 'This course is produced by Hugging Face.'}]