[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/16bjnXxKZzgAc8njhqqVAuVn7EqVr_hcY?usp=sharing)

**NLP use cases**
- Classifying whole sentences
- Classifying each word in a sentence (Named Entity Recognition)
- Answering a question given a context
- Text summarization
- Fill in the blanks
- Translating from one language to another

In [11]:
# Install transformers framework
%%capture
!pip install transformers[sentencepiece] 

### Note: You need to have installed also tensorflow in anaconda using:
1. conda create -n tf tensorflow

2. conda activate tf

or

1. conda create -n tf-gpu tensorflow-gpu

2. conda activate tf-gpu

or 

1. pytorch

In [12]:
# Imports the `pipeline` function from the `transformers` library and the `TextWrapper` class from the `textwrap` module.
# `pipeline` is a function that makes it easy to use pre-trained transformer models for a variety of natural language 
# processing (NLP) tasks, such as text classification, question answering, and text generation. It abstracts away many 
# of the implementation details and provides a simple interface for input and output.
# `TextWrapper` is a class that can be used to format text by wrapping long lines of text and breaking them into 
# multiple lines that fit within a specified width. In this code, it creates an instance of the `TextWrapper` class 
# with a `width` of 80 and two options: `break_long_words` and `break_on_hyphens`, which are set to `False`. 
# The `width` parameter specifies the maximum line length for the text, and the other two parameters control how words 
# are wrapped. `break_long_words` specifies whether to break long words or not, and `break_on_hyphens` specifies whether 
# to break lines at hyphens or not.

from transformers import pipeline
import textwrap
wrapper = textwrap.TextWrapper(width=80, break_long_words=False, break_on_hyphens=False)

## Classifying whole sentences

In [13]:
# This code is using the Hugging Face Transformers library to classify the sentiment of a given sentence. 
# First, it imports the `pipeline` module from the Transformers library, which is a high-level module that 
# allows you to perform various natural language processing tasks with pre-trained models. 
# Then, it imports the `textwrap` module from the standard library to help with formatting the output. 
# It creates a `TextWrapper` object that specifies a maximum line width of 80 characters and turns off 
# breaking long words and hyphenated words. It then defines a `sentence` variable that contains a sample sentence.
# Next, it initializes a text classification pipeline using the `pipeline` function from the Transformers library.
# The pipeline is initialized with the `'text-classification'` task and the `distilbert-base-uncased-finetuned-sst-2-english` 
# pre-trained model. This model has been fine-tuned on the Stanford Sentiment Treebank dataset, which is a dataset of movie 
# reviews labeled with their sentiment (positive or negative).
# The `classifier` object is then used to classify the sentiment of the `sentence`. 
# The resulting object `c` is a list of dictionaries, with each dictionary representing a possible label and its 
# corresponding score.
# Finally, it prints the original `sentence` variable formatted with the `TextWrapper` object, and the predicted sentiment 
# label and its corresponding score.

sentence = 'The flights were on time both in Sydney and the connecting flight in Singapore. The organisation to cope with the COVID 19 restrictions while in transit was well planned and directions easy to follow, the plane was comfortable with a reasonable selection of in flight entertainment. Crew were pleasant and helpful.'
classifier = pipeline('text-classification', model='distilbert-base-uncased-finetuned-sst-2-english')
c = classifier(sentence)
print('\nSentence:')
print(wrapper.fill(sentence))
print(f"\nThis sentence is classified with a {c[0]['label']} sentiment")


Sentence:
The flights were on time both in Sydney and the connecting flight in Singapore.
The organisation to cope with the COVID 19 restrictions while in transit was
well planned and directions easy to follow, the plane was comfortable with a
reasonable selection of in flight entertainment. Crew were pleasant and helpful.

This sentence is classified with a POSITIVE sentiment


## Classifying each word in a sentence (Named Entity Recognition)

In [14]:
# This code uses the Hugging Face Transformers library to perform named entity recognition (NER) on a given sentence 
# using the pre-trained model "dbmdz/bert-large-cased-finetuned-conll03-english". 
# After importing the required packages and defining the input sentence, the `pipeline` function is called to 
# initialize a text classification pipeline with the pre-trained model specified. The `token-classification` task 
# is used here as it involves identifying and labeling individual tokens in the input text. The `grouped_entities` 
# parameter is set to `True` to group consecutive tokens with the same entity label.
# The `ners` variable stores the results of the NER task performed on the input sentence. 
# Finally, the script prints the input sentence using the `textwrap` module to wrap lines at a maximum width of 80 characters.
# It then loops through each entity identified by the NER model and prints the word and its corresponding entity group.

sentence = "Singapore Airlines was the first airline to fly the A380. Chew Choon Seng was Singapore Airline's CEO at the time. Singapore Airlines flies to New York daily."
ner = pipeline('token-classification', model='dbmdz/bert-large-cased-finetuned-conll03-english', grouped_entities=True)
ners = ner(sentence)
print('\nSentence:')
print(wrapper.fill(sentence))
print('\n')
for n in ners:
  print(f"{n['word']} -> {n['entity_group']}")


Sentence:
Singapore Airlines was the first airline to fly the A380. Chew Choon Seng was
Singapore Airline's CEO at the time. Singapore Airlines flies to New York daily.


Singapore Airlines -> ORG
A380 -> MISC
Chew Choon Seng -> PER
Singapore Airline -> ORG
Singapore Airlines -> ORG
New York -> LOC


## Answering a question given a context

In [15]:
context = '''
Singapore Airlines was founded in 1947 and was originally known as Malayan Airways. It is the national airline of Singapore and is based at Singapore Changi Airport. 
From this hub, the airline flies to more than 60 destinations, with flights to Seoul, Tokyo and Melbourne among the most popular of its routes. 
It is particularly strong in Southeast Asian and Australian destinations (the so-called Kangaroo Route), but also flies to 6 different continents, covering 35 countries.
There are more than 100 planes in the Singapore Airlines fleet, most of which are Airbus aircraft plus a smaller amount of Boeings.
The company is known for frequently updating the aircraft in its fleet.'''


question = 'How many aircrafts does Singapore Airlines have?'

print('Text:')
print(wrapper.fill(context))
print('\nQuestion:')
print(question)

Text:
 Singapore Airlines was founded in 1947 and was originally known as Malayan
Airways. It is the national airline of Singapore and is based at Singapore
Changi Airport.  From this hub, the airline flies to more than 60 destinations,
with flights to Seoul, Tokyo and Melbourne among the most popular of its routes.
It is particularly strong in Southeast Asian and Australian destinations (the
so-called Kangaroo Route), but also flies to 6 different continents, covering 35
countries. There are more than 100 planes in the Singapore Airlines fleet, most
of which are Airbus aircraft plus a smaller amount of Boeings. The company is
known for frequently updating the aircraft in its fleet.

Question:
How many aircrafts does Singapore Airlines have?


In [16]:
# The code is using the Transformers library to create a question-answering pipeline using the 
# 'distilbert-base-cased-distilled-squad' model. Once the pipeline is created, it uses the context and question 
# variables to generate an answer.
# The 'context' variable refers to the text where the answer to the question can be found, while the 'question' 
# variable refers to the actual question being asked.

from transformers import pipeline

qa = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')

print('\nQuestion:')
print(question + '\n')
print('Answer:')
a = qa(context=context, question=question)
a['answer']


Question:
How many aircrafts does Singapore Airlines have?

Answer:


'more than 100'

## Text summarization

In [17]:
review = '''
Extremely unusual time to fly as we needed an exemption to fly out of Australia from the government. We obtained one as working in Tokyo for the year as teachers.
The check in procedure does take a lot longer as more paperwork and phone calls are needed to check if you are allowed to travel. The staff were excellent in explaining the procedure as they are working with very few numbers.
The flight had 40 people only, so lots of room and yes we had 3 seats each. The service of meals and beverages was done very quickly and efficiently.
Changi airport was like a ghost town with most shops closed and all passengers are walked/transported to a transit zone until your next flight is ready. You are then walked in single file or transported to your next flight, so very strange as at times their seemed be more workers in PPE gear than passengers.
The steps we went through at Narita were extensive, downloading apps, fill in paperwork and giving a saliva sample to test for covid 19. 
It took about 2 hours to get through the steps and we only sat down for maybe 10 minutes at the last stop to get back your covid results. 
The people involved were fantastic and we were lucky that we were numbers two and three in the initial first line up, but still over 2 hours it took so be aware. We knew we were quick as the people picking us up told us we were first out.'''

print('\nOriginal text:\n')
print(wrapper.fill(review))
summarize = pipeline('summarization', model='sshleifer/distilbart-cnn-12-6')
summarized_text = summarize(review)[0]['summary_text']
print('\nSummarized text:')
print(wrapper.fill(summarized_text))


Original text:

 Extremely unusual time to fly as we needed an exemption to fly out of Australia
from the government. We obtained one as working in Tokyo for the year as
teachers. The check in procedure does take a lot longer as more paperwork and
phone calls are needed to check if you are allowed to travel. The staff were
excellent in explaining the procedure as they are working with very few numbers.
The flight had 40 people only, so lots of room and yes we had 3 seats each. The
service of meals and beverages was done very quickly and efficiently. Changi
airport was like a ghost town with most shops closed and all passengers are
walked/transported to a transit zone until your next flight is ready. You are
then walked in single file or transported to your next flight, so very strange
as at times their seemed be more workers in PPE gear than passengers. The steps
we went through at Narita were extensive, downloading apps, fill in paperwork
and giving a saliva sample to test for covid 

## Fill in the blanks

In [18]:
# The output of the code will be five possible completion of the sentence "It is the national <mask> of Singapore" 
# using the `fill-mask` pipeline from Hugging Face's transformers library. The pipeline predicts the masked word 
# given the context provided. 
# Each of the five possible completions will be printed on a separate line, and each completion will have replaced 
# the `<mask>` with a predicted word based on the model's knowledge.

sentence = 'It is the national <mask> of Singapore'
mask = pipeline('fill-mask', model='distilroberta-base')
masks = mask(sentence)
for m in masks:
  print(m['sequence'])

It is the national anthem of Singapore
It is the national capital of Singapore
It is the national pride of Singapore
It is the national treasure of Singapore
It is the national motto of Singapore


In [19]:
sentence = 'Singapore Airlines is the national <mask> of Singapore'
mask = pipeline('fill-mask', model='distilroberta-base')
masks = mask(sentence)
for m in masks:
  print(m['sequence'])

Singapore Airlines is the national airline of Singapore
Singapore Airlines is the national carrier of Singapore
Singapore Airlines is the national airport of Singapore
Singapore Airlines is the national airlines of Singapore
Singapore Airlines is the national capital of Singapore


## Translation (English to German)

In [20]:
# The code above uses the Hugging Face `pipeline` module to perform English to German translation using the `t5-base` model. 
# The `pipeline` function takes two arguments: the task to be performed (`translation_en_to_de`) and 
# the name of the model (`t5-base`). 
# Then, the `translator` function is called with the English sentence as input. 
# The output is a list with a single dictionary item, which contains the translated sentence under the 
# key `'translation_text'`. 
# Finally, the original and translated sentences are printed to the console using the `print` function.

english = '''Singapore Airlines is my favourite airline'''

translator = pipeline('translation_en_to_de', model='t5-base')
german = translator(english)
print('\nEnglish:')
print(english)
print('\nGerman:')
print(german[0]['translation_text'])


English:
Singapore Airlines is my favourite airline

German:
Singapore Airlines ist meine Lieblingsfluggesellschaft
