<a href="https://colab.research.google.com/github/Zibraan/My_ML_DL_Codes/blob/main/TF2_0_NLP_tasks_with_Transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1fBiTjumxoFCRQf1zOtZUokC9mJZTw9Y6?usp=sharing)

**NLP use cases**
- Classifying whole sentences
- Classifying each word in a sentence
- Answering a question
- Text summarization
- Fill in the blanks
- Translating from one language to another

In [None]:
import tensorflow
tensorflow.__version__

'2.9.2'

In [None]:
%%capture
!pip install transformers

In [None]:
from transformers import pipeline
import textwrap
wrapper = textwrap.TextWrapper(width=80, break_long_words=False, break_on_hyphens=False)

##Classifying whole sentences

In [None]:
sentence = 'The flights were on time both in Sydney and the connecting flight in Singapore. The organisation to cope with the COVID 19 restrictions while in transit was well planned and directions easy to follow, the plane was comfortable with a reasonable selection of in flight entertainment. Crew were pleasant and helpful.'

classifier = pipeline('text-classification', model='distilbert-base-uncased-finetuned-sst-2-english')
c = classifier(sentence)
print('\nSentence:')
print(wrapper.fill(sentence))
print(f"\nThis sentence is classified with a {c[0]['label']} sentiment")


Sentence:
The flights were on time both in Sydney and the connecting flight in Singapore.
The organisation to cope with the COVID 19 restrictions while in transit was
well planned and directions easy to follow, the plane was comfortable with a
reasonable selection of in flight entertainment. Crew were pleasant and helpful.

This sentence is classified with a POSITIVE sentiment


##Classifying each word in a sentence

In [None]:
sentence = "Singapore Airlines was the first airline to fly the A380. Chew Choon Seng was Singapore Airline's CEO at the time. Singapore Airlines flies to New York daily."
ner = pipeline('token-classification', model='dbmdz/bert-large-cased-finetuned-conll03-english', grouped_entities=True)
ners = ner(sentence)
print('\nSentence:')
print(wrapper.fill(sentence))
print('\n')
for n in ners:
  print(f"{n['word']} -> {n['entity_group']}")

  f'`grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="{aggregation_strategy}"` instead.'



Sentence:
Singapore Airlines was the first airline to fly the A380. Chew Choon Seng was
Singapore Airline's CEO at the time. Singapore Airlines flies to New York daily.


Singapore Airlines -> ORG
A380 -> MISC
Chew Choon Seng -> PER
Singapore Airline -> ORG
Singapore Airlines -> ORG
New York -> LOC


##Answering a question

In [None]:
context = '''
Singapore Airlines was founded in 1947 and was originally known as Malayan Airways. It is the national airline of Singapore and is based at Singapore Changi Airport.
From this hub, the airline flies to more than 60 destinations, with flights to Seoul, Tokyo and Melbourne among the most popular of its routes.
It is particularly strong in Southeast Asian and Australian destinations (the so-called Kangaroo Route), but also flies to 6 different continents, covering 35 countries.
There are more than 100 planes in the Singapore Airlines fleet, most of which are Airbus aircraft plus a smaller amount Boeings.
The company is known for frequently updating the aircraft in its fleet.'''


question = 'How many aircrafts does Singapore Airlines have?'

print('Text:')
print(wrapper.fill(context))
print('\nQuestion:')
print(question)

Text:
 Singapore Airlines was founded in 1947 and was originally known as Malayan
Airways. It is the national airline of Singapore and is based at Singapore
Changi Airport.  From this hub, the airline flies to more than 60 destinations,
with flights to Seoul, Tokyo and Melbourne among the most popular of its routes.
It is particularly strong in Southeast Asian and Australian destinations (the
so-called Kangaroo Route), but also flies to 6 different continents, covering 35
countries. There are more than 100 planes in the Singapore Airlines fleet, most
of which are Airbus aircraft plus a smaller amount Boeings. The company is known
for frequently updating the aircraft in its fleet.

Question:
How many aircrafts does Singapore Airlines have?


In [None]:
from transformers import pipeline

qa = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')

print('\nQuestion:')
print(question + '\n')
print('Answer:')
a = qa(context=context, question=question)
a['answer']

Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]


Question:
How many aircrafts does Singapore Airlines have?

Answer:


'more than 100'

##Text summarization

In [None]:
review = '''
Extremely unusual time to fly as we needed an exemption to fly out of Australia from the government. We obtained one as working in Tokyo for the year as teachers.
The check in procedure does take a lot longer as more paperwork and phone calls are needed to check if you are allowed to travel. The staff were excellent in explaining the procedure as they are working with very few numbers.
The flight had 40 people only, so lots of room and yes we had 3 seats each. The service of meals and beverages was done very quickly and efficiently.
Changi airport was like a ghost town with most shops closed and all passengers are walked/transported to a transit zone until your next flight is ready. You are then walked in single file or transported to your next flight, so very strange as at times their seemed be more workers in PPE gear than passengers.
The steps we went through at Narita were extensive, downloading apps, fill in paperwork and giving a saliva sample to test for covid 19.
It took about 2 hours to get through the steps and we only sat down for maybe 10 minutes at the last stop to get back your covid results.
The people involved were fantastic and we were lucky that we were numbers two and three in the initial first line up, but still over 2 hours it took so be aware. We knew we were quick as the people picking us up told us we were first out.'''

print('\nOriginal text:\n')
print(wrapper.fill(review))
summarize = pipeline('summarization', model='sshleifer/distilbart-cnn-12-6')
summarized_text = summarize(review)[0]['summary_text']
print('\nSummarized text:')
print(wrapper.fill(summarized_text))


Original text:

 Extremely unusual time to fly as we needed an exemption to fly out of Australia
from the government. We obtained one as working in Tokyo for the year as
teachers. The check in procedure does take a lot longer as more paperwork and
phone calls are needed to check if you are allowed to travel. The staff were
excellent in explaining the procedure as they are working with very few numbers.
The flight had 40 people only, so lots of room and yes we had 3 seats each. The
service of meals and beverages was done very quickly and efficiently. Changi
airport was like a ghost town with most shops closed and all passengers are
walked/transported to a transit zone until your next flight is ready. You are
then walked in single file or transported to your next flight, so very strange
as at times their seemed be more workers in PPE gear than passengers. The steps
we went through at Narita were extensive, downloading apps, fill in paperwork
and giving a saliva sample to test for covid 

Downloading:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]


Summarized text:
 The flight had 40 people only, so lots of room and yes we had 3 seats each .
The check in procedure does take a lot longer as more paperwork and phone calls
are needed to check if you are allowed to travel . The staff were excellent in
explaining the procedure as they are working with very few numbers .


##Fill in the blanks

In [None]:
sentence = 'It is the national <mask> of Singapore'
mask = pipeline('fill-mask', model='distilroberta-base')
masks = mask(sentence)
for m in masks:
  print(m['sequence'])

Downloading:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/331M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

It is the national anthem of Singapore
It is the national capital of Singapore
It is the national pride of Singapore
It is the national treasure of Singapore
It is the national motto of Singapore


In [None]:
sentence = 'Singapore Airlines is the national <mask> of Singapore'
mask = pipeline('fill-mask', model='distilroberta-base')
masks = mask(sentence)
for m in masks:
  print(m['sequence'])

Singapore Airlines is the national airline of Singapore
Singapore Airlines is the national carrier of Singapore
Singapore Airlines is the national airport of Singapore
Singapore Airlines is the national airlines of Singapore
Singapore Airlines is the national capital of Singapore


##Translation (English to German)

In [None]:
english = '''It took about 2 hours to get through the steps and we only sat down for maybe 10 minutes at the last stop to get back your covid results. '''

translator = pipeline('translation_en_to_de', model='t5-base')
german = translator(english)
print('\nEnglish:')
print(english)
print('\nGerman:')
print(german[0]['translation_text'])

Downloading:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.



English:
It took about 2 hours to get through the steps and we only sat down for maybe 10 minutes at the last stop to get back your covid results. 

German:
Es dauerte ca. 2 Stunden, die Schritte zu durchlaufen und wir saßen nur für etwa 10 Minuten an der letzten Haltestelle, um Ihre Ergebnisse zurückzuholen.
