# Inferring

One of the most useful properties is to infer sentiment and topics, from product reviews, articles...

**Inferring** => the model takes an input text and makes some kind of analysis
  - extracting labels, topics, names...
  - understanding the sentiment of a text

Points in support of LLMs:
  - If you precise to extract the sentiment, positive or negative, from a piece of text, in a traditional machine workflow,<br/>
  you have to collect a label dataset, train de model, figure out how to deploy the model somewhere in the cloud and make inferences.
  - That could work pretty well, but it was just a lot of work to go through that process.
  - Also for every task, such sentiment versus extracting names versus something else, you have to train and deploy a separate model.
  - One of the really nice things about LLMs is that for many tasks like these, you can just write a prompt.
  - This feature gives tremendous speed in terms of application development.
  - Using one model and one API, to do many different tasks rather than needing to figure out how to train and deploy a lot of different models.

In [1]:
import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key = os.getenv('OPENAI_API_KEY')

In [2]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

## Product review text

This is the product review from which sentiments will be extracted.<br/>
Anyway, the sentiment analysis may be used in multiple other cases, such as articles or customer emails.

In [3]:
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

## Sentiment (positive/negative)

In [4]:
prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

The sentiment of the product review is positive.


You can make the model to answer with a more concise response to make it easier for post-processing.<br/>
I can take this prompt and add another instruction to set answers to a single word, either positive or negative.<br/>
Which makes it easier to take this output and process it, and do something with it.

In [5]:
prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Give your answer as a single word, either "positive" \
or "negative".

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

positive


## Identify types of emotions

LLMs are pretty good at extracting specific things out of a piece of text.<br/>
The emotions could be useful for understanfing how your customers think about a particular product.<br/>
And this approach, gathers a list of emotions in order to classify or understand better the customer.

In [6]:
prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

happy, satisfied, grateful, impressed, content


## Identify anger

For a lot of customer support organizations, it's important to understand if a particular user is extremely upset.<br/>
So, you might have a different classification problem.

In [8]:
prompt = f"""
Is the writer of the following review expressing anger?\
The review is delimited with triple backticks. \
Give your answer as either yes or no.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

No


Notice that with supervised learning, if I had wanted to build all of these classifiers,<br/>
there's no way I would have been able to do this with supervised learning in the just a few minutes.

## Extract product and company name

Information extraction is the part of NLP (Natural Language Processing) that relates to taking a piece of text and extracting certain things
that you want to know from the text.

For instance, if you are trying to summarize many reviews from an online shopping e-commerce website,<br/>
it might be useful, to figure out what were the items, who made the items, to track trends about positive or negative sentiment,<br/>
for specific items or for specific manufacturers.

In this next prompt, is being asked to indentify the following items: the item purchase, and the name of the company that made the item.

Even, the model is instructed to format the response as a JSON object with "Item" and "Brand" as the keys.<br/>
Thus, makes it easier to load the output into a python dictionary for post-processing.

In [9]:
prompt = f"""
Identify the following items from the review text: 
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Item" and "Brand" as the keys. 
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

{
  "Item": "lamp",
  "Brand": "Lumina"
}


## Doing multiple tasks at once

In the examples we've gone throught, you saw how to write a prompt to **recognize the sentiment**, figure out if someone is angry, and then<br/>
also extract some **richer information** such as the item and the brand.

One way to extract all of this information would be to use three or four prompts and call "get_completion" multiple times.

But it turns out you can actually write a single prompt to extract all of this information at the same time.

In [14]:
prompt = f"""
Identify the following items from the review text: 
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

{
  "Sentiment": "positive",
  "Anger": false,
  "Item": "lamp with additional storage",
  "Brand": "Lumina"
}


## Inferring topics

One of the cool applications of LLMs is inferring topics.<br/>
Given a long piece of text, what is this piece of text about? what are the topics?

In [15]:
story = """
In a recent survey conducted by the government, 
public sector employees were asked to rate their level 
of satisfaction with the department they work at. 
The results revealed that NASA was the most popular 
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings, 
stating, "I'm not surprised that NASA came out on top. 
It's a great place to work with amazing people and 
incredible opportunities. I'm proud to be a part of 
such an innovative organization."

The results were also welcomed by NASA's management team, 
with Director Tom Johnson stating, "We are thrilled to 
hear that our employees are satisfied with their work at NASA. 
We have a talented and dedicated team who work tirelessly 
to achieve our goals, and it's fantastic to see that their 
hard work is paying off."

The survey also revealed that the 
Social Security Administration had the lowest satisfaction 
rating, with only 45% of employees indicating they were 
satisfied with their job. The government has pledged to 
address the concerns raised by employees in the survey and 
work towards improving job satisfaction across all departments.
"""

## Infer 5 topics

In [21]:
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a list of items separated by commas.

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

government survey, public sector employees, job satisfaction, NASA, Social Security Administration


In [22]:
response.split(sep=',')

['government survey',
 ' public sector employees',
 ' job satisfaction',
 ' NASA',
 ' Social Security Administration']

## Make a news alert for certain topics

Imagine you have a collection of articles and extract topics, you can then also use a large language model to help you index into differnt topics.

So, let's supppose we've extracted the following topics list:

In [23]:
topic_list = [
    "nasa", 
    "local government",
    "engineering", 
    "employee satisfaction",
    "federal government"
]

And let's sat you want to figure out, given a news article, which of these topics are covered in it.

In [24]:
prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as list with 0 or 1 for each topic.\

List of topics: {", ".join(topic_list)}

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

nasa: 1
local government: 0
engineering: 0
employee satisfaction: 1
federal government: 1


In machine learning this is sometimes called a "Zero-Shot Learning Algotithm",<br/>
because we didn't give it any training data that was labeled, so that's Zero-Shot.

With just a prompt, it was able to determine which of these topics are covered in that news article.

But take into account that if the last prompt should be in production, maybe I'd say to get the output in JSON format rather than a list.<br/>
Because the output of the large language model could be a little bit inconsistent.

**Using JSON format is a more robust way to output results.**

In [25]:
topic_dict = {i.split(': ')[0]: int(i.split(': ')[1]) for i in response.split(sep='\n')}
if topic_dict['nasa'] == 1:
    print("ALERT: New NASA story!")

ALERT: New NASA story!
