<a href="https://colab.research.google.com/github/astrapi69/DroidBallet/blob/master/NLP_D3_4_L5_NLP_Applications_with_Transformers_and_LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><a target="_blank" href="https://academy.constructor.org/"><img src="https://jobtracker.ai/static/media/constructor_academy_colour.b86fa87f.png" width="200" style="background:none; border:none; box-shadow:none;" /></a> </center>

_____

<center>Constructor Academy, 2024</center>

# NLP applications with Transformers and Large Language Models (LLMs)

## Leveraging Transformers Pipelines

__Credit:__ This notebook has been adapted from the __`transformers`__ package by HuggingFace from their examples. You can checkout their repository [here](https://github.com/huggingface/transformers)

Newly introduced in transformers v2.3.0, **pipelines** provides a high-level, easy to use,
API for doing inference over a variety of downstream-tasks, including:

- ***Sentence Classification _(Sentiment Analysis)_***: Indicate if the overall sentence is either positive or negative, i.e. *binary classification task* or *logitic regression task*.
- ***Token Classification (Named Entity Recognition, Part-of-Speech tagging)***: For each sub-entities _(*tokens*)_ in the input, assign them a label, i.e. classification task.
- ***Question-Answering***: Provided a tuple (`question`, `context`) the model should find the span of text in `content` answering the `question`.
- ***Mask-Filling***: Suggests possible word(s) to fill the masked input with respect to the provided `context`.
- ***Summarization***: Summarizes the ``input`` article to a shorter article.
- ***Translation***: Translates the input from a language to another language.
- ***Feature Extraction***: Maps the input to a higher, multi-dimensional space learned from the data.

Pipelines encapsulate the overall process of every NLP process:

 1. *Tokenization*: Split the initial input into multiple sub-entities with ... properties (i.e. tokens).
 2. *Inference*: Maps every tokens into a more meaningful representation.
 3. *Decoding*: Use the above representation to generate and/or extract the final output for the underlying task.

The overall API is exposed to the end-user through the `pipeline()` method with the following
structure:

```python
from transformers import pipeline

# Using default model and tokenizer for the task
pipeline("<task-name>")

# Using a user-specified model
pipeline("<task-name>", model="<model_name>")

# Using custom model/tokenizer as str
pipeline('<task-name>', model='<model name>', tokenizer='<tokenizer_name>')
```


These models are already fine-tuned models for specific tasks available in huggingface hub

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

### Install dependencies

In [None]:
!pip install -q transformers

In [None]:
from transformers import pipeline

### 1. Sentence Classification - Sentiment Analysis

In [None]:
nlp_sentiment_model = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [None]:
nlp_sentiment_model('This is an excellent movie! Really nice plot and casting.')

[{'label': 'POSITIVE', 'score': 0.9998741149902344}]

In [None]:
nlp_sentiment_model('This movie was so NOT good!')

[{'label': 'NEGATIVE', 'score': 0.9998019337654114}]

In [None]:
nlp_sentiment_model('I tried to like this book but I definitely did not enjoy reading it!')

[{'label': 'NEGATIVE', 'score': 0.9994773268699646}]

In [None]:
sentiment_model = pipeline('sentiment-analysis',
                           model='sbcBI/sentiment_analysis')

config.json:   0%|          | 0.00/769 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/333 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [None]:
sentiment_model('The sun rises in the east')

[{'label': 'LABEL_1', 'score': 0.6036325693130493}]

In [None]:
sentiment_model('The movie was horrible')

[{'label': 'LABEL_0', 'score': 0.5034124851226807}]

In [None]:
sentiment_model('The movie was great')

[{'label': 'LABEL_2', 'score': 0.9853494763374329}]

### 2. Question Answering

In [None]:
nlp_qa = pipeline('question-answering')

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [None]:
context = """
Coronaviruses are a large family of viruses which may cause illness in animals or humans.
In humans, several coronaviruses are known to cause respiratory infections ranging from the
common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS).
The most recently discovered coronavirus causes coronavirus disease COVID-19.
COVID-19 is the infectious disease caused by the most recently discovered coronavirus.
This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019.
COVID-19 is now a pandemic affecting many countries globally.
The most common symptoms of COVID-19 are fever, dry cough, and tiredness.
Other symptoms that are less common and may affect some patients include aches
and pains, nasal congestion, headache, conjunctivitis, sore throat, diarrhea,
loss of taste or smell or a rash on skin or discoloration of fingers or toes.
These symptoms are usually mild and begin gradually.
Some people become infected but only have very mild symptoms.
Most people (about 80%) recover from the disease without needing hospital treatment.
Around 1 out of every 5 people who gets COVID-19 becomes seriously ill and develops difficulty breathing.
Older people, and those with underlying medical problems like high blood pressure, heart and lung problems,
diabetes, or cancer, are at higher risk of developing serious illness.
However, anyone can catch COVID-19 and become seriously ill.
People of all ages who experience fever and/or  cough associated with difficulty breathing/shortness of breath,
chest pain/pressure, or loss of speech or movement should seek medical attention immediately.
If possible, it is recommended to call the health care provider or facility first,
so the patient can be directed to the right clinic.
People can catch COVID-19 from others who have the virus.
The disease spreads primarily from person to person through small droplets from the nose or mouth,
which are expelled when a person with COVID-19 coughs, sneezes, or speaks.
These droplets are relatively heavy, do not travel far and quickly sink to the ground.
People can catch COVID-19 if they breathe in these droplets from a person infected with the virus.
This is why it is important to stay at least 1 meter) away from others.
These droplets can land on objects and surfaces around the person such as tables, doorknobs and handrails.
People can become infected by touching these objects or surfaces, then touching their eyes, nose or mouth.
This is why it is important to wash your hands regularly with soap and water or clean with alcohol-based hand rub.
Practicing hand and respiratory hygiene is important at ALL times and is the best way to protect others and yourself.
When possible maintain at least a 1 meter distance between yourself and others.
This is especially important if you are standing by someone who is coughing or sneezing.
Since some infected persons may not yet be exhibiting symptoms or their symptoms may be mild,
maintaining a physical distance with everyone is a good idea if you are in an area where COVID-19 is circulating.
"""

In [None]:
nlp_qa(context=context, question='What is a coronavirus ?')

{'score': 0.6717578768730164,
 'start': 19,
 'end': 89,
 'answer': 'a large family of viruses which may cause illness in animals or humans'}

In [None]:
nlp_qa(context=context, question='What is covid-19 ?')

{'score': 0.42547106742858887,
 'start': 407,
 'end': 476,
 'answer': 'infectious disease caused by the most recently discovered coronavirus'}

In [None]:
nlp_qa(context=context, question='What are covid-19 symptoms ?')

{'score': 0.8406679630279541,
 'start': 682,
 'end': 713,
 'answer': 'fever, dry cough, and tiredness'}

In [None]:
nlp_qa(context=context, question='How do people get infected by covid-19 ?')

{'score': 0.3460099995136261,
 'start': 2464,
 'end': 2498,
 'answer': 'touching these objects or surfaces'}

In [None]:
nlp_qa(context=context, question='How can we protect ourselves from covid-19 ?')

{'score': 0.6492880582809448,
 'start': 2656,
 'end': 2695,
 'answer': 'Practicing hand and respiratory hygiene'}

### 3. Summarization

Summarization is currently supported by `Bart` and `T5`.

In [None]:
summarizer = pipeline('summarization')

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [None]:
BIG_DOC = """
Coronaviruses are a large family of viruses which may cause illness in animals or humans.
In humans, several coronaviruses are known to cause respiratory infections ranging from the
common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS).
The most recently discovered coronavirus causes coronavirus disease COVID-19.
COVID-19 is the infectious disease caused by the most recently discovered coronavirus.
This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019.
COVID-19 is now a pandemic affecting many countries globally.
The most common symptoms of COVID-19 are fever, dry cough, and tiredness.
Other symptoms that are less common and may affect some patients include aches
and pains, nasal congestion, headache, conjunctivitis, sore throat, diarrhea,
loss of taste or smell or a rash on skin or discoloration of fingers or toes.
These symptoms are usually mild and begin gradually.
Some people become infected but only have very mild symptoms.
Most people (about 80%) recover from the disease without needing hospital treatment.
Around 1 out of every 5 people who gets COVID-19 becomes seriously ill and develops difficulty breathing.
Older people, and those with underlying medical problems like high blood pressure, heart and lung problems,
diabetes, or cancer, are at higher risk of developing serious illness.
However, anyone can catch COVID-19 and become seriously ill.
People of all ages who experience fever and/or  cough associated with difficulty breathing/shortness of breath,
chest pain/pressure, or loss of speech or movement should seek medical attention immediately.
If possible, it is recommended to call the health care provider or facility first,
so the patient can be directed to the right clinic.
People can catch COVID-19 from others who have the virus.
The disease spreads primarily from person to person through small droplets from the nose or mouth,
which are expelled when a person with COVID-19 coughs, sneezes, or speaks.
These droplets are relatively heavy, do not travel far and quickly sink to the ground.
People can catch COVID-19 if they breathe in these droplets from a person infected with the virus.
This is why it is important to stay at least 1 meter) away from others.
These droplets can land on objects and surfaces around the person such as tables, doorknobs and handrails.
People can become infected by touching these objects or surfaces, then touching their eyes, nose or mouth.
This is why it is important to wash your hands regularly with soap and water or clean with alcohol-based hand rub.
Practicing hand and respiratory hygiene is important at ALL times and is the best way to protect others and yourself.
When possible maintain at least a 1 meter distance between yourself and others.
This is especially important if you are standing by someone who is coughing or sneezing.
Since some infected persons may not yet be exhibiting symptoms or their symptoms may be mild,
maintaining a physical distance with everyone is a good idea if you are in an area where COVID-19 is circulating.
"""


result = summarizer(BIG_DOC)

In [None]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
result

[{'summary_text': ' COVID-19 is the infectious disease caused by the most recently discovered coronavirus . The most common symptoms of the disease are fever, dry cough, and tiredness . Around 80% of people recover from the disease without needing hospital treatment . Around 1 out of every 5 people who gets the disease becomes seriously ill and develops difficulty breathing .'}]

In [None]:
summary = result[0]['summary_text']
print('\n'.join(nltk.sent_tokenize(summary)))

 COVID-19 is the infectious disease caused by the most recently discovered coronavirus .
The most common symptoms of the disease are fever, dry cough, and tiredness .
Around 80% of people recover from the disease without needing hospital treatment .
Around 1 out of every 5 people who gets the disease becomes seriously ill and develops difficulty breathing .


### 4. Translation

Translation is currently supported from XX to YY language pairs.

In [None]:
# English to French
translator = pipeline('translation_en_to_fr')

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [None]:
translator("The quick brown fox jumped over the lazy dog")

[{'translation_text': 'Le renard brun rapide saute au-dessus du chien piètre'}]

In [None]:
# English to German
translator = pipeline('translation_en_to_de')

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [None]:
translator("The quick brown fox jumped over the lazy dog")

[{'translation_text': 'Der schnelle braune Fuchs sprang über den faulen Hund'}]

### 5. Text Categorization or Classification

Text classification is supported using zero-shot models which need no training

In [None]:
categorizer = pipeline('zero-shot-classification')

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
news_categories = ['sports', 'business', 'technology', 'entertainment']

In [None]:
doc = "Josep Minguella, the agent who took Lionel Messi to Barcelona, has revealed that the Argentine used to look up to Juan Román Riquelme. He recalled an incident from the PSG star's teenage years when he first met the Argentina legend. Speaking to INFOBAE, Minguella revealed that Messi sat at the bottom of a table and kept looking at Riquelme like he was God. He believes that the 2022 FIFA World Cup winner looked up to the Boca Juniors legend as an idol."
doc

"Josep Minguella, the agent who took Lionel Messi to Barcelona, has revealed that the Argentine used to look up to Juan Román Riquelme. He recalled an incident from the PSG star's teenage years when he first met the Argentina legend. Speaking to INFOBAE, Minguella revealed that Messi sat at the bottom of a table and kept looking at Riquelme like he was God. He believes that the 2022 FIFA World Cup winner looked up to the Boca Juniors legend as an idol."

In [None]:
categorizer(doc, news_categories)

{'sequence': "Josep Minguella, the agent who took Lionel Messi to Barcelona, has revealed that the Argentine used to look up to Juan Román Riquelme. He recalled an incident from the PSG star's teenage years when he first met the Argentina legend. Speaking to INFOBAE, Minguella revealed that Messi sat at the bottom of a table and kept looking at Riquelme like he was God. He believes that the 2022 FIFA World Cup winner looked up to the Boca Juniors legend as an idol.",
 'labels': ['sports', 'technology', 'entertainment', 'business'],
 'scores': [0.8054736256599426,
  0.07955774664878845,
  0.060199156403541565,
  0.05476945638656616]}

In [None]:
doc = "Intelsat, operator of one of the world’s largest integrated satellite and terrestrial networks and leading provider of inflight connectivity (IFC), ordered a Mission Extension Pod (MEP) from Northrop Grumman Corporation’s SpaceLogistics, which will add life to an Intelsat satellite and provide uninterrupted services to many customers."
doc

'Intelsat, operator of one of the world’s largest integrated satellite and terrestrial networks and leading provider of inflight connectivity (IFC), ordered a Mission Extension Pod (MEP) from Northrop Grumman Corporation’s SpaceLogistics, which will add life to an Intelsat satellite and provide uninterrupted services to many customers.'

In [None]:
categorizer(doc, news_categories)

{'sequence': 'Intelsat, operator of one of the world’s largest integrated satellite and terrestrial networks and leading provider of inflight connectivity (IFC), ordered a Mission Extension Pod (MEP) from Northrop Grumman Corporation’s SpaceLogistics, which will add life to an Intelsat satellite and provide uninterrupted services to many customers.',
 'labels': ['technology', 'business', 'entertainment', 'sports'],
 'scores': [0.7486833333969116,
  0.22467149794101715,
  0.015766151249408722,
  0.010879015550017357]}

In [None]:
doc = "It is now being reported that Morena Baccarin and Stefan Kapicic will reprise their roles as Vanessa and Colossus respectively. It was previously announced that Hugh Jackman will also return as Wolverine for the Ryan Reynolds starrer Deadpool 3. Deadpool 3 keeps getting bigger with the addition of each cast member"
doc

'It is now being reported that Morena Baccarin and Stefan Kapicic will reprise their roles as Vanessa and Colossus respectively. It was previously announced that Hugh Jackman will also return as Wolverine for the Ryan Reynolds starrer Deadpool 3. Deadpool 3 keeps getting bigger with the addition of each cast member'

In [None]:
categorizer(doc, news_categories)

{'sequence': 'It is now being reported that Morena Baccarin and Stefan Kapicic will reprise their roles as Vanessa and Colossus respectively. It was previously announced that Hugh Jackman will also return as Wolverine for the Ryan Reynolds starrer Deadpool 3. Deadpool 3 keeps getting bigger with the addition of each cast member',
 'labels': ['entertainment', 'business', 'technology', 'sports'],
 'scores': [0.5784639716148376,
  0.2582780718803406,
  0.11911319196224213,
  0.04414479807019234]}

## Leveraging OpenAI Large Language Models (LLMs) like ChatGPT

Thanks to the `openai` library and tools like `langchain` we can easily load the latest state of the art LLMs like ChatGPT (based on GPT-3.5) and solve diverse NLP problems using simple prompt engineering

### Load Dependencies

In [None]:
!pip install -q transformers
!pip install "openai<1.0.0" # previous but stable version

Collecting openai<1.0.0
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/77.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/77.0 kB[0m [31m1.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.[0m[31m
[0mSuccessfully installed openai-0.28.1


#### Remember to restart the kernel

## Load OpenAI API Credentials

Here we load it from a file so we don't explore the credentials on the internet by mistake

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
import yaml

with open('chatgpt_api_credentials.yml', 'r') as file:
    api_creds = yaml.safe_load(file)

In [None]:
api_creds.keys()

dict_keys(['openai_key'])

## Create ChatGPT Chat Completion Access Function

ChatGPT is the state of the art commercial LLM which is paid

This function will use the [Chat Completion API](https://platform.openai.com/docs/api-reference/chat/create) to access ChatGPT for us and return responses

In [None]:
import openai

openai.api_key = api_creds['openai_key']

def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

### 1. Sentiment Analysis



In [None]:
reviews = ['This is an excellent movie! Really nice plot and casting.',
           'This movie was so NOT good!',
           'I tried to like this book but I definitely did not enjoy reading it!']

#### Sentiment Detection using multiple API calls - One for each document

In [None]:
sentiments = []

for review in reviews:
  prompt = f"""
            Detect the sentiment of the following movie review,
            delimited by triple backticks.
            The sentiment should be either positive, negative or neutral.
            Format the output showing the review text and the sentiment.

            ```{review}```
            """
  response = get_completion(prompt)
  sentiments.append(response)

In [None]:
for sentiment in sentiments:
  print(sentiment)
  print('\n')

Review: This is an excellent movie! Really nice plot and casting.
Sentiment: Positive


Review: This movie was so NOT good!
Sentiment: Negative


Review: I tried to like this book but I definitely did not enjoy reading it!
Sentiment: Negative




#### Summarizing using one API call - Combining all documents together

In [None]:
reviews_text = f"""
"""

for i, review in enumerate(reviews):
  reviews_text += '\nReview '+str(i+1)+':\n```'+review+'```\n'

print(reviews_text)



Review 1:
```This is an excellent movie! Really nice plot and casting.```

Review 2:
```This movie was so NOT good!```

Review 3:
```I tried to like this book but I definitely did not enjoy reading it!```



In [None]:
prompt = f"""
            Detect the sentiment of the following movie reviews,
            delimited by triple backticks.
            The sentiment should be either positive, negative or neutral.
            Format the output showing the review text and the sentiment for each review.

            {reviews_text}
            """
response = get_completion(prompt)

In [None]:
print(response)

Review 1: This is an excellent movie! Really nice plot and casting.
Sentiment: Positive

Review 2: This movie was so NOT good!
Sentiment: Negative

Review 3: I tried to like this book but I definitely did not enjoy reading it!
Sentiment: Negative


### 2. Question Answering Chatbot

In [None]:
context = """
Coronaviruses are a large family of viruses which may cause illness in animals or humans.
In humans, several coronaviruses are known to cause respiratory infections ranging from the
common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS).
The most recently discovered coronavirus causes coronavirus disease COVID-19.
COVID-19 is the infectious disease caused by the most recently discovered coronavirus.
This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019.
COVID-19 is now a pandemic affecting many countries globally.
The most common symptoms of COVID-19 are fever, dry cough, and tiredness.
Other symptoms that are less common and may affect some patients include aches
and pains, nasal congestion, headache, conjunctivitis, sore throat, diarrhea,
loss of taste or smell or a rash on skin or discoloration of fingers or toes.
These symptoms are usually mild and begin gradually.
Some people become infected but only have very mild symptoms.
Most people (about 80%) recover from the disease without needing hospital treatment.
Around 1 out of every 5 people who gets COVID-19 becomes seriously ill and develops difficulty breathing.
Older people, and those with underlying medical problems like high blood pressure, heart and lung problems,
diabetes, or cancer, are at higher risk of developing serious illness.
However, anyone can catch COVID-19 and become seriously ill.
People of all ages who experience fever and/or  cough associated with difficulty breathing/shortness of breath,
chest pain/pressure, or loss of speech or movement should seek medical attention immediately.
If possible, it is recommended to call the health care provider or facility first,
so the patient can be directed to the right clinic.
People can catch COVID-19 from others who have the virus.
The disease spreads primarily from person to person through small droplets from the nose or mouth,
which are expelled when a person with COVID-19 coughs, sneezes, or speaks.
These droplets are relatively heavy, do not travel far and quickly sink to the ground.
People can catch COVID-19 if they breathe in these droplets from a person infected with the virus.
This is why it is important to stay at least 1 meter) away from others.
These droplets can land on objects and surfaces around the person such as tables, doorknobs and handrails.
People can become infected by touching these objects or surfaces, then touching their eyes, nose or mouth.
This is why it is important to wash your hands regularly with soap and water or clean with alcohol-based hand rub.
Practicing hand and respiratory hygiene is important at ALL times and is the best way to protect others and yourself.
When possible maintain at least a 1 meter distance between yourself and others.
This is especially important if you are standing by someone who is coughing or sneezing.
Since some infected persons may not yet be exhibiting symptoms or their symptoms may be mild,
maintaining a physical distance with everyone is a good idea if you are in an area where COVID-19 is circulating.
"""

In [None]:
chat_text = """
Given the following context information about a disease
which is COVID-19 delimited by triple backticks,

Act as a frequently asked questions chatbot
and answer the following question delimited by triple backticks
asked factually from the information provided below in context.

Context: ```{context}```
Question: ```{query}```
"""

query='What is a coronavirus?'
prompt = chat_text.format(context=context, query=query)

response = get_completion(prompt)
response

'A coronavirus is a type of virus that can cause illness in animals and humans. There are several coronaviruses that can cause respiratory infections, ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes the infectious disease known as COVID-19.'

In [None]:
query='What is COVID-19?'
prompt = chat_text.format(context=context, query=query)

response = get_completion(prompt)
response

'COVID-19 is the infectious disease caused by the most recently discovered coronavirus. It is a respiratory illness that can range from mild symptoms, such as fever, dry cough, and tiredness, to more severe symptoms, such as difficulty breathing. COVID-19 was first identified in Wuhan, China in December 2019 and has since become a global pandemic affecting many countries.'

In [None]:
query='How to prevent COVID-19?'
prompt = chat_text.format(context=context, query=query)

response = get_completion(prompt)
print(response)

To prevent COVID-19, it is important to practice good hand and respiratory hygiene. Here are some preventive measures:

1. Wash your hands regularly with soap and water for at least 20 seconds. If soap and water are not available, use an alcohol-based hand sanitizer.
2. Avoid touching your face, especially your eyes, nose, and mouth, as the virus can enter your body through these areas.
3. Maintain a distance of at least 1 meter (3 feet) from others, especially if they are coughing or sneezing.
4. Wear a mask in public settings, especially when it is difficult to maintain physical distancing.
5. Cover your mouth and nose with your bent elbow or tissue when you cough or sneeze. Dispose of used tissues immediately and wash your hands.
6. Clean and disinfect frequently-touched objects and surfaces regularly, such as tables, doorknobs, and handrails.
7. Stay home if you feel unwell, have a fever, cough, or difficulty breathing. Seek medical attention and follow the advice of healthcare pro

### 3. Text Summarizer

In [None]:
chat_text = """
Given the following context information about a disease
which is COVID-19 delimited by triple backticks,
Summarize the given information
and show the most critical information in max 5 bullet points

Context: ```{context}```
"""

prompt = chat_text.format(context=context)

response = get_completion(prompt)
print(response)

- Coronaviruses can cause respiratory infections in animals and humans.
- COVID-19 is the disease caused by the most recently discovered coronavirus.
- The most common symptoms of COVID-19 are fever, dry cough, and tiredness.
- Around 1 out of every 5 people who get COVID-19 become seriously ill and have difficulty breathing.
- COVID-19 spreads primarily through small droplets from the nose or mouth and can also be contracted by touching contaminated surfaces.


### 4. Language Translation

In [None]:
chat_text = """
Given the following sentence in English.

Output the sentence and then its translation in both
French and German

Sentence: {sent}
"""

sentence = 'The quick brown fox jumped over the lazy dog'
prompt = chat_text.format(sent=sentence)

response = get_completion(prompt)
print(response)

Sentence: The quick brown fox jumped over the lazy dog

French Translation: Le renard brun rapide a sauté par-dessus le chien paresseux

German Translation: Der schnelle braune Fuchs sprang über den faulen Hund


### 5. Zero-shot Text Categorization or Classification


In [None]:
articles = ["Josep Minguella, the agent who took Lionel Messi to Barcelona, has revealed that the Argentine used to look up to Juan Román Riquelme. He recalled an incident from the PSG star's teenage years when he first met the Argentina legend. Speaking to INFOBAE, Minguella revealed that Messi sat at the bottom of a table and kept looking at Riquelme like he was God. He believes that the 2022 FIFA World Cup winner looked up to the Boca Juniors legend as an idol.",
             "Intelsat, operator of one of the world’s largest integrated satellite and terrestrial networks and leading provider of inflight connectivity (IFC), ordered a Mission Extension Pod (MEP) from Northrop Grumman Corporation’s SpaceLogistics, which will add life to an Intelsat satellite and provide uninterrupted services to many customers.",
             "It is now being reported that Morena Baccarin and Stefan Kapicic will reprise their roles as Vanessa and Colossus respectively. It was previously announced that Hugh Jackman will also return as Wolverine for the Ryan Reynolds starrer Deadpool 3. Deadpool 3 keeps getting bigger with the addition of each cast member"
           ]

In [None]:
categories = []

for article in articles:
  prompt = f"""
            Act as a news article category classifier.
            Given the following news article delimited by triple backticks,
            try to classify each article into only one out of the following categories:

            sports, business, technology or entertainment

            Format the output as JSON with the article text and the category.
            ```{article}```
            """
  response = get_completion(prompt)
  categories.append(response)

In [None]:
for text in categories:
  print(text)
  print('\n')

{
  "article": "Josep Minguella, the agent who took Lionel Messi to Barcelona, has revealed that the Argentine used to look up to Juan Román Riquelme. He recalled an incident from the PSG star's teenage years when he first met the Argentina legend. Speaking to INFOBAE, Minguella revealed that Messi sat at the bottom of a table and kept looking at Riquelme like he was God. He believes that the 2022 FIFA World Cup winner looked up to the Boca Juniors legend as an idol.",
  "category": "sports"
}


{
  "article": "Intelsat, operator of one of the world’s largest integrated satellite and terrestrial networks and leading provider of inflight connectivity (IFC), ordered a Mission Extension Pod (MEP) from Northrop Grumman Corporation’s SpaceLogistics, which will add life to an Intelsat satellite and provide uninterrupted services to many customers.",
  "category": "technology"
}


{
  "article": "It is now being reported that Morena Baccarin and Stefan Kapicic will reprise their roles as Vane

Now you have a nice picture of what is possible through transformers' pipelines and LLMs like ChatGPT!

Feel free to try these different pipelines with your own inputs