# NLP Use Cases:

 * Classifying whole sentences 
 * Classifying each word in a sentence (Named Entity Recognition)
 * Answering a question given a context
 * Text summarization
 * Fill in the blanks
 * Translating from one language to another  

In [2]:
from transformers import pipeline 
import textwrap

wrapper = textwrap.TextWrapper(width=80, break_long_words=False, break_on_hyphens=False)

### Classifying Whole Sentences:

In [4]:
# Here I have used actual reviews I have found online.
# The model needs to determine if the sentiment of the review is positive-
# or negative. 

from cProfile import label


sentence = "This is still one of the best phones in 2023, I would not exchange it for any phone in the world including the 14 Pro Max"
classifier = pipeline('text-classification', model='distilbert-base-uncased-finetuned-sst-2-english')
c = classifier(sentence)

print('\nSentence:')
print(wrapper.fill(sentence))
print(f"\nThis sentence is classified with a {c[0]['label']} sentiment")

# *The below output shows that the sentiment was labeled as positive. 

Downloading: 100%|██████████| 629/629 [00:00<00:00, 210kB/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading: 100%|██████████| 268M/268M [00:09<00:00, 29.4MB/s] 
Downloading: 100%|██████████| 48.0/48.0 [00:00<00:00, 23.9kB/s]
Downloading: 100%|██████████| 232k/232k [00:00<00:00, 473kB/s] 



Sentence:
This is still one of the best phones in 2023, I would not exchange it for any
phone in the world including the 14 Pro Max

This sentence is classified with a POSITIVE sentiment


### Classifying each Word in a Sentence (Named Entity Recognition):

In [9]:
# When classifying each word in a sentence, the task is to be able to take words or-
# group of wordsand map them to either an organization, a person, or a location.

sentence = "Very tidy and lovely AirBnB apartment equipped with everything you need. A good bed and nice bathroom. Greg Towers is a great host and there when you need him, Very nice and wants to share all he know about the area. We had a great stay in London."
ner = pipeline('token-classification', model='dbmdz/bert-large-cased-finetuned-conll03-english', grouped_entities=True)
ners = ner(sentence)

print('\nSentence:')
print(wrapper.fill(sentence))
print('\n')

# loops through the words in the sentence and tries to match them to the correct entity.
for n in ners:
    print(f"{n['word']} -> {n['entity_group']}")
    
# From the output below I can se that the model properly classified-
# the entities. 


Sentence:
Very tidy and lovely AirBnB apartment equipped with everything you need. A good
bed and nice bathroom. Greg Towers is a great host and there when you need him,
Very nice and wants to share all he know about the area. We had a great stay in
London.


AirBnB -> ORG
Greg Towers -> PER
London -> LOC


### Answering a Question given a Context:

In [11]:
# Here I provide a text as the context and ask a question based on that context-
# and see if the model can extract the correct answer from the context.

context = '''
Singapore Airlines was founded in 1947 and was originally known as Malayan Airways. It is the national airline of Singapore and is based at Singapore Changi Airport. 
From this hub, the airline flies to more than 60 destinations, with flights to Seoul, Tokyo and Melbourne among the most popular of its routes. 
It is particularly strong in Southeast Asian and Australian destinations (the so-called Kangaroo Route), but also flies to 6 different continents, covering 35 countries.
There are more than 100 planes in the Singapore Airlines fleet, most of which are Airbus aircraft plus a smaller amount of Boeings.
The company is known for frequently updating the aircraft in its fleet.'''

# Notice that the question that I ask is not how many planes, but how many aircrafts.
# This is testing whether the model is able to understand nuaces in the English language.
question = 'How many aircrafts does Singapore Airlines have?'

print('Text:')
print(wrapper.fill(context))
print('\nQuestion:')
print(question)

Text:
 Singapore Airlines was founded in 1947 and was originally known as Malayan
Airways. It is the national airline of Singapore and is based at Singapore
Changi Airport.  From this hub, the airline flies to more than 60 destinations,
with flights to Seoul, Tokyo and Melbourne among the most popular of its routes.
It is particularly strong in Southeast Asian and Australian destinations (the
so-called Kangaroo Route), but also flies to 6 different continents, covering 35
countries. There are more than 100 planes in the Singapore Airlines fleet, most
of which are Airbus aircraft plus a smaller amount of Boeings. The company is
known for frequently updating the aircraft in its fleet.

Question:
How many aircrafts does Singapore Airlines have?


In [12]:
qa = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')

print('\nQuestion:')
print(question + '\n')
print('Answer:')

a = qa(context=context, question=question)
a['answer']

Downloading: 100%|██████████| 473/473 [00:00<00:00, 236kB/s]
Downloading: 100%|██████████| 261M/261M [00:17<00:00, 14.6MB/s] 
Downloading: 100%|██████████| 29.0/29.0 [00:00<00:00, 29.0kB/s]
Downloading: 100%|██████████| 213k/213k [00:00<00:00, 433kB/s] 
Downloading: 100%|██████████| 436k/436k [00:00<00:00, 509kB/s] 



Question:
How many aircrafts does Singapore Airlines have?

Answer:


'more than 100'

### Text Summarization:

In [14]:
# With text summarization, it does exactly what it says.
# The model is able to summarize lenghty sentences into 3 sentences

review = '''
Mine is 92 percent after 11 months of use. I have mostly used samsung's 45W charger but sometimes the original 20W from apple. 
Never charged overnight and rarely to 100 percent, the battery health is not important and not accurate at all but in my case I know why it is showing 92 percent, because of the fast charging with samsung charger the phone can draw that 27W max charging speed and that must have caused some trouble with the calibration of the battery health. My friend uses 61W apple charger from his MacBook and has 90 percent health after 1 year, so it is normal and does not matter at all, if you know to know the state of your battery you should go and check battery cycle count.
'''

print('\nOriginal Text:\n')
print(wrapper.fill(review))

summarize = pipeline('summarization', model='sshleifer/distilbart-cnn-12-6')
summarized_text = summarize(review)[0]['summary_text']

print('\nSummarized Text:')
print(wrapper.fill(summarized_text))


Original Text:

 Mine is 92 percent after 11 months of use. I have mostly used samsung's 45W
charger but sometimes the original 20W from apple.  Never charged overnight and
rarely to 100 percent, the battery health is not important and not accurate at
all but in my case I know why it is showing 92 percent, because of the fast
charging with samsung charger the phone can draw that 27W max charging speed and
that must have caused some trouble with the calibration of the battery health.
My friend uses 61W apple charger from his MacBook and has 90 percent health
after 1 year, so it is normal and does not matter at all, if you know to know
the state of your battery you should go and check battery cycle count.


Downloading: 100%|██████████| 1.80k/1.80k [00:00<00:00, 1.79MB/s]
Downloading: 100%|██████████| 1.22G/1.22G [00:41<00:00, 29.2MB/s]
Downloading: 100%|██████████| 26.0/26.0 [00:00<00:00, 13.0kB/s]
Downloading: 100%|██████████| 899k/899k [00:01<00:00, 725kB/s] 
Downloading: 100%|██████████| 456k/456k [00:00<00:00, 853kB/s] 



Summarized Text:
 The battery health is not important and not accurate at all but in my case I
know why it is showing 92 percent . Fast charging with samsung charger the phone
can draw that 27W max charging speed . My friend uses 61W apple charger from his
MacBook and has 90 percent health after 1 year .


### Fill in The Blanks:

In [15]:
# In this example, if I feed in the sentence, 'It is the national <mask> of Puerto Rico',
# the other sentences that are provided are less likely in order of decreasing probability.

sentence = 'It is the national <mask> of Puerto Rico'
mask = pipeline('fill-mask', model='distilroberta-base')
masks = mask(sentence)

for m in masks:
    print(m['sequence'])

# The output shows the most probable to the least.

Downloading: 100%|██████████| 480/480 [00:00<00:00, 239kB/s]
Downloading: 100%|██████████| 331M/331M [00:21<00:00, 15.6MB/s] 
Downloading: 100%|██████████| 899k/899k [00:01<00:00, 600kB/s] 
Downloading: 100%|██████████| 456k/456k [00:00<00:00, 634kB/s] 
Downloading: 100%|██████████| 1.36M/1.36M [00:02<00:00, 519kB/s]


It is the national anthem of Puerto Rico
It is the national capital of Puerto Rico
It is the national motto of Puerto Rico
It is the national holiday of Puerto Rico
It is the national flag of Puerto Rico


In [16]:
sentence = 'My mother is the most <mask> person in the world'
mask = pipeline('fill-mask', model='distilroberta-base')
masks = mask(sentence)

for m in masks:
    print(m['sequence'])

My mother is the most beautiful person in the world
My mother is the most famous person in the world
My mother is the most amazing person in the world
My mother is the most powerful person in the world
My mother is the most important person in the world


### Translation (English to Japanese):

In [18]:
english = '''My favorite food is steak and fries'''

translator = pipeline('translation_en_to_de', model='t5-base')
german = translator(english)

print('\nEnglish:')
print(english)
print('\nGerman:')
print(german[0]['translation_text'])

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.



English:
My favorite food is steak and fries

German:
Mein Lieblingsgericht sind Steak und Pommes Frites.
