<a href="https://colab.research.google.com/github/RajnandiniG/ML_NLP-TF/blob/main/2_Nlp_Transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1fBiTjumxoFCRQf1zOtZUokC9mJZTw9Y6?usp=sharing)

In [1]:
import tensorflow
tensorflow.__version__

'2.15.0'

In [2]:
%%capture
!pip install transformers

In [3]:
from transformers import pipeline
import textwrap
wrapper = textwrap.TextWrapper(width=80, break_long_words=False, break_on_hyphens=False)

##Classifying whole sentences

In [4]:
sentence = 'The flights were never on time both in Sydney and the connecting flight in Singapore. The organisation to cope with the COVID 19 restrictions while in transit was not well planned and directions easy to follow, the plane was not comfortable with a reasonable selection of in flight entertainment. Crew were pleasant and helpful.'

classifier = pipeline('text-classification', model='distilbert-base-uncased-finetuned-sst-2-english')
c = classifier(sentence)
print('\nSentence:')
print(wrapper.fill(sentence))
print(f"\nThis sentence is classified with a {c[0]['label']} sentiment")

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]


Sentence:
The flights were never on time both in Sydney and the connecting flight in
Singapore. The organisation to cope with the COVID 19 restrictions while in
transit was not well planned and directions easy to follow, the plane was not
comfortable with a reasonable selection of in flight entertainment. Crew were
pleasant and helpful.

This sentence is classified with a NEGATIVE sentiment


##Classifying each word in a sentence

In [5]:
sentence = "Singapore Airlines was the first airline to fly the A380. Chew Choon Seng was Singapore Airline's CEO at the time. Singapore Airlines flies to New York daily."
ner = pipeline('token-classification', model='dbmdz/bert-large-cased-finetuned-conll03-english', grouped_entities=True)
ners = ner(sentence)
print('\nSentence:')
print(wrapper.fill(sentence))
print('\n')
for n in ners:
  print(f"{n['word']} -> {n['entity_group']}")

config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]




Sentence:
Singapore Airlines was the first airline to fly the A380. Chew Choon Seng was
Singapore Airline's CEO at the time. Singapore Airlines flies to New York daily.


Singapore Airlines -> ORG
A380 -> MISC
Chew Choon Seng -> PER
Singapore Airline -> ORG
Singapore Airlines -> ORG
New York -> LOC


##Answering a question

In [6]:
context = '''
Singapore Airlines was founded in 1947 and was originally known as Malayan Airways. It is the national airline of Singapore and is based at Singapore Changi Airport.
From this hub, the airline flies to more than 60 destinations, with flights to Seoul, Tokyo and Melbourne among the most popular of its routes.
It is particularly strong in Southeast Asian and Australian destinations (the so-called Kangaroo Route), but also flies to 6 different continents, covering 35 countries.
There are more than 100 planes in the Singapore Airlines fleet, most of which are Airbus aircraft plus a smaller amount Boeings.
The company is known for frequently updating the aircraft in its fleet.'''


question1 = 'How many aircrafts does Singapore Airlines have?'
question2 = 'When was the airline founded?'

print('Text:')
print(wrapper.fill(context))
print('\nQuestion:')
print(question1 + '/n')
print(question2)

Text:
 Singapore Airlines was founded in 1947 and was originally known as Malayan
Airways. It is the national airline of Singapore and is based at Singapore
Changi Airport. From this hub, the airline flies to more than 60 destinations,
with flights to Seoul, Tokyo and Melbourne among the most popular of its routes.
It is particularly strong in Southeast Asian and Australian destinations (the
so-called Kangaroo Route), but also flies to 6 different continents, covering 35
countries. There are more than 100 planes in the Singapore Airlines fleet, most
of which are Airbus aircraft plus a smaller amount Boeings. The company is known
for frequently updating the aircraft in its fleet.

Question:
How many aircrafts does Singapore Airlines have?/n
When was the airline founded?


In [10]:
from transformers import pipeline

qa = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')

a = qa(context=context, question=question1)
print('\nQuestion:')
print(question1 + '\n')
print('Answer:')
a['answer']


Question:
How many aircrafts does Singapore Airlines have?

Answer:


'more than 100'

In [13]:
from transformers import pipeline

qa = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')
xa = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')

a = xa(context=context, question=question1)
print('\nQuestion:')
print(question1 + '\n')
print('Answer:')
print(a['answer']) #ensure to use print with a only than will display ans

b = qa(context=context, question=question2)

print('\nQuestion:')
print(question2 + '\n')
print('Answer:')
b['answer'] #python automatically print as its the last expression in the cell


Question:
How many aircrafts does Singapore Airlines have?

Answer:
more than 100

Question:
When was the airline founded?

Answer:


'1947'

##Text summarization

In [14]:
review = '''
Extremely unusual time to fly as we needed an exemption to fly out of Australia from the government. We obtained one as working in Tokyo for the year as teachers.
The check in procedure does take a lot longer as more paperwork and phone calls are needed to check if you are allowed to travel. The staff were excellent in explaining the procedure as they are working with very few numbers.
The flight had 40 people only, so lots of room and yes we had 3 seats each. The service of meals and beverages was done very quickly and efficiently.
Changi airport was like a ghost town with most shops closed and all passengers are walked/transported to a transit zone until your next flight is ready. You are then walked in single file or transported to your next flight, so very strange as at times their seemed be more workers in PPE gear than passengers.
The steps we went through at Narita were extensive, downloading apps, fill in paperwork and giving a saliva sample to test for covid 19.
It took about 2 hours to get through the steps and we only sat down for maybe 10 minutes at the last stop to get back your covid results.
The people involved were fantastic and we were lucky that we were numbers two and three in the initial first line up, but still over 2 hours it took so be aware. We knew we were quick as the people picking us up told us we were first out.'''

print('\nOriginal text:\n')
print(wrapper.fill(review))
summarize = pipeline('summarization', model='sshleifer/distilbart-cnn-12-6')
summarized_text = summarize(review)[0]['summary_text']
print('\nSummarized text:')
print(wrapper.fill(summarized_text))


Original text:

 Extremely unusual time to fly as we needed an exemption to fly out of Australia
from the government. We obtained one as working in Tokyo for the year as
teachers. The check in procedure does take a lot longer as more paperwork and
phone calls are needed to check if you are allowed to travel. The staff were
excellent in explaining the procedure as they are working with very few numbers.
The flight had 40 people only, so lots of room and yes we had 3 seats each. The
service of meals and beverages was done very quickly and efficiently. Changi
airport was like a ghost town with most shops closed and all passengers are
walked/transported to a transit zone until your next flight is ready. You are
then walked in single file or transported to your next flight, so very strange
as at times their seemed be more workers in PPE gear than passengers. The steps
we went through at Narita were extensive, downloading apps, fill in paperwork
and giving a saliva sample to test for covid 

config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]


Summarized text:
 The flight had 40 people only, so lots of room and yes we had 3 seats each .
The service of meals and beverages was done very quickly and efficiently . The
check in procedure does take a lot longer as more paperwork and phone calls are
needed to check if you are allowed to travel .


In [15]:
review = '''
Miss Brill' is the story of an old woman told brilliantly and realistically, balancing thoughts and emotions that sustain her late solitary life amidst all the bustle of modern life. Miss Brill is a regular visitor on Sundays to the Jardins Publiques (the Public Gardens) of a small French suburb where she sits and watches all sorts of people come and go. She listens to the band playing, loves to watch people and guess what keeps them going, and enjoys contemplating the world as a great stage upon which actors perform. She finds herself to be another actor among the so many she sees, or at least herself as 'part of the performance after all.' One Sunday Miss Brill puts on her fur and goes to the Public Gardens as usual. The evening ends with her sudden realization that she is old and lonely, a realization brought to her by a conversation she overhears between a boy and a girl, presumably lovers, who comment on her unwelcome presence in their vicinity. Miss Brill is sad and depressed as she returns home, not stopping by as usual to buy her Sunday delicacy, a slice of honey-cake. She retires to her dark room, puts the fur back into the box and imagines that she has heard something cry.'''

print('\nOriginal text:\n')
print(wrapper.fill(review))
summarize = pipeline('summarization', model='sshleifer/distilbart-cnn-12-6')
summarized_text = summarize(review)[0]['summary_text']
print('\nSummarized text:')
print(wrapper.fill(summarized_text))


Original text:

 Miss Brill' is the story of an old woman told brilliantly and realistically,
balancing thoughts and emotions that sustain her late solitary life amidst all
the bustle of modern life. Miss Brill is a regular visitor on Sundays to the
Jardins Publiques (the Public Gardens) of a small French suburb where she sits
and watches all sorts of people come and go. She listens to the band playing,
loves to watch people and guess what keeps them going, and enjoys contemplating
the world as a great stage upon which actors perform. She finds herself to be
another actor among the so many she sees, or at least herself as 'part of the
performance after all.' One Sunday Miss Brill puts on her fur and goes to the
Public Gardens as usual. The evening ends with her sudden realization that she
is old and lonely, a realization brought to her by a conversation she overhears
between a boy and a girl, presumably lovers, who comment on her unwelcome
presence in their vicinity. Miss Brill is sad

##Fill in the blanks

In [17]:
sentence = 'It is the national <mask> of India'
mask = pipeline('fill-mask', model='distilroberta-base')
masks = mask(sentence)
for m in masks:
  print(m['sequence'])

Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


It is the national anthem of India
It is the national treasure of India
It is the national pride of India
It is the national motto of India
It is the national heritage of India


In [18]:
sentence = 'Singapore Airlines is the national <mask> of Singapore'
mask = pipeline('fill-mask', model='distilroberta-base')
masks = mask(sentence)
for m in masks:
  print(m['sequence'])

Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Singapore Airlines is the national airline of Singapore
Singapore Airlines is the national carrier of Singapore
Singapore Airlines is the national airport of Singapore
Singapore Airlines is the national airlines of Singapore
Singapore Airlines is the national capital of Singapore


##Translation (English to German)

In [20]:
english = '''It took about 2 hours to get through the steps and we only sat down for maybe 10 minutes at the last stop to get back your covid results. '''

translator = pipeline('translation_en_to_de', model='t5-base')
german = translator(english)
print('\nEnglish:')
print(english)
print('\nGerman:')
print(german[0]['translation_text'])


English:
It took about 2 hours to get through the steps and we only sat down for maybe 10 minutes at the last stop to get back your covid results. 

German:
Es dauerte ca. 2 Stunden, die Schritte zu durchlaufen und wir saßen nur für etwa 10 Minuten an der letzten Haltestelle, um Ihre Ergebnisse zurückzuholen.
