# **Inferring**
In this lesson, you will infer sentiment and topics from product reviews and news articles.

## Setup

In [1]:
from openai import OpenAI
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

OPENAI_API_KEY  = os.getenv('OPENAI_API_KEY')

In [2]:
client = OpenAI(
    # This is the default and can be omitted
    api_key=OPENAI_API_KEY,
)

def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message.content

## Product review text

In [3]:
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

## Sentiment (positive/negative)

In [4]:
prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

The sentiment of the review is positive. The reviewer is satisfied with the lamp, the customer service, and the company in general.


In [5]:
prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Give your answer as a single word, either "positive" \
or "negative".

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

Positive


## Identify types of emotions

In [6]:
prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

happy, satisfied, grateful, impressed, pleased


## Identify anger

In [7]:
prompt = f"""
Is the writer of the following review expressing anger?\
The review is delimited with triple backticks. \
Give your answer as either yes or no.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

No


## Extract product and company name from customer reviews

In [8]:
prompt = f"""
Identify the following items from the review text: 
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Item" and "Brand" as the keys. 
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
  
Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

{
  "Item": "lamp",
  "Brand": "Lumina"
}


## Doing multiple tasks at once

In [9]:
prompt = f"""
Identify the following items from the review text: 
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

{
    "Sentiment": "positive",
    "Anger": false,
    "Item": "lamp",
    "Brand": "Lumina"
}


## Inferring Text Topics
Another application inferring by an LLM is deducing topics from a lengthy piece of text.

This time, the sample is regarding a fictitious newspaper article about a survey conducted by the government measuring the satisfaction rate of workers in government agencies. The results reveal that NASA workers had the highest satisfaction rating.Inferring Text Topics
Another application inferring by an LLM is deducing topics from a lengthy piece of text.

This time, the sample is regarding a fictitious newspaper article about a survey conducted by the government measuring the satisfaction rate of workers in government agencies. The results reveal that NASA workers had the highest satisfaction rating.

In [10]:
story = """
In a recent survey conducted by the government, 
public sector employees were asked to rate their level 
of satisfaction with the department they work at. 
The results revealed that NASA was the most popular 
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings, 
stating, "I'm not surprised that NASA came out on top. 
It's a great place to work with amazing people and 
incredible opportunities. I'm proud to be a part of 
such an innovative organization."

The results were also welcomed by NASA's management team, 
with Director Tom Johnson stating, "We are thrilled to 
hear that our employees are satisfied with their work at NASA. 
We have a talented and dedicated team who work tirelessly 
to achieve our goals, and it's fantastic to see that their 
hard work is paying off."

The survey also revealed that the 
Social Security Administration had the lowest satisfaction 
rating, with only 45% of employees indicating they were 
satisfied with their job. The government has pledged to 
address the concerns raised by employees in the survey and 
work towards improving job satisfaction across all departments.
"""

Five topics discussed in the article are requested from the model in a format that each item is one or two words long and in a comma-separated list. ChatGPT returns the topics as government surveys, job satisfaction, NASA, etc.

In [11]:
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long. 

Format your response as a list of items separated by commas without numbering them.

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

Government, Survey, Job satisfaction, NASA, Social Security Administration


In [12]:
response.split(sep=', ')

['Government',
 'Survey',
 'Job satisfaction',
 'NASA',
 'Social Security Administration']

## Make a news alert for certain topics

The final sample application is about the selection of topics that a text covers, among a targeted topics list. Initially, the list of possible topics is defined:The final sample application is about the selection of topics that a text covers, among a targeted topics list. Initially, the list of possible topics is defined:

In [13]:
topic_list = [
    "nasa", "local government", "engineering", 
    "employee satisfaction", "federal government"
]

In [14]:
prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as a dictionay where the key is a topic and the value is 0 or 1 for each topic if it appears.\

List of topics: {", ".join(topic_list)}

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

{
    "nasa": 1,
    "local government": 0,
    "engineering": 0,
    "employee satisfaction": 1,
    "federal government": 1
}


In [15]:
for i in response.split(', '):
    print(i)

{
    "nasa": 1,
    "local government": 0,
    "engineering": 0,
    "employee satisfaction": 1,
    "federal government": 1
}


In [16]:
response = 'nasa: 1,\n    "local government": 0, sports: 3'  # Example input

# Initialize an empty dictionary to store the parsed results
topic_dict = {}

# Split the response into individual items based on comma
for item in response.split(','):
    try:
        # Further split each item into key-value pairs based on colon
        key, value = item.split(':')
        key = key.strip().strip('"')  # Remove any extra whitespace and quotes
        value = value.strip()  # Remove any extra whitespace

        # Convert the value to an integer
        topic_dict[key] = int(value)
    except ValueError as e:
        # Print error and continue for any items that do not fit the expected format
        print(f"Skipping item due to error: {e} with item: {item}")

# Check for the 'nasa' key and print the alert if needed
if topic_dict.get('nasa') == 1:
    print("ALERT: New NASA story!")
else:
    print("No new NASA story.")


ALERT: New NASA story!


# Exercise
 - Complete the prompts similar to what we did in class. 
     - Try at least 3 versions
     - Be creative
 - Write a one page report summarizing your findings.
     - Were there variations that didn't work well? i.e., where GPT either hallucinated or wrong
 - What did you learn?

## **Exercise 1: Inferring Sentiment and Topics from a News Article**
#### Step 1: Sentiment Analysis

In [17]:
news_article = """
A new study by researchers at Stanford University 
has found that frequent naps could improve cognitive 
function in older adults. The study, published in 
the Journal of Sleep Research, suggests that 
taking regular naps could help prevent cognitive 
decline and improve memory and alertness.

"We observed a significant improvement in cognitive 
function among participants who took daily naps 
compared to those who did not," said Dr. Emily Chen, 
lead researcher of the study.

The findings have sparked interest among health 
experts, who believe that integrating nap routines 
into daily life could have significant health benefits 
for aging populations.
"""

# Prompt for sentiment analysis
prompt_sentiment = f"""
What is the sentiment of the following news article, 
which is delimited with triple backticks?

News article: '''{news_article}'''
"""
response_sentiment = get_completion(prompt_sentiment)
print(response_sentiment)


The sentiment of the news article is positive, as it discusses the potential benefits of frequent naps in improving cognitive function in older adults and preventing cognitive decline. The article also mentions that the findings have sparked interest among health experts, indicating a positive outlook on the potential health benefits of integrating nap routines into daily life.


#### Step 2: Identify Topics

In [18]:
# Prompt for identifying topics
prompt_topics = f"""
Determine five topics that are being discussed in the 
following news article, which is delimited by triple backticks.

Make each item one or two words long.

Text sample: '''{news_article}'''
"""
response_topics = get_completion(prompt_topics)
print(response_topics)


1. Study
2. Naps
3. Cognitive function
4. Memory
5. Health benefits


## Exercise 2: Making a News Alert Based on Topics
#### Step 1: Define Topic List and Check Presence

In [19]:
# Define a list of topics
topic_list = [
    "nap study", "cognitive function", "memory improvement",
    "aging populations", "health benefits"
]

# Prompt to check presence of topics
prompt_alert = f"""
Determine whether each item in the following list of 
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as a dictionary where the key is a topic 
and the value is 0 or 1 for each topic if it appears.

List of topics: {", ".join(topic_list)}

Text sample: '''{news_article}'''
"""
response_alert = get_completion(prompt_alert)
print(response_alert)


{
  "nap study": 1,
  "cognitive function": 1,
  "memory improvement": 1,
  "aging populations": 1,
  "health benefits": 1
}


#### Step 2: Trigger Alert for Relevant Topic

In [27]:
# Example response_alert received from OpenAI API
response_alert = '''
    "nap study": 1,
    "cognitive function": 0,
    "memory improvement": 0,
    "aging populations": 1,
    "health benefits": 1
'''

# Ensure correct splitting and processing of response_alert
pairs = [pair.strip() for pair in response_alert.split(',')]
topic_dict = {}
for pair in pairs:
    parts = pair.split(':')
    if len(parts) == 2:
        # Clean up key and value
        key = parts[0].strip().strip('"')
        value = int(parts[1].strip())
        topic_dict[key] = value

# Example alert triggering for a specific topic
if topic_dict.get('nap study', 0) == 1:
    print("ALERT: New study on napping impact!")
else:
    print("No alert triggered.")


ALERT: New study on napping impact!


## Exercise 3: Extracting Product Information from Reviews
#### Step 1: Extract Product and Brand

In [28]:
phone_review = """
Bought this new phone recently. The camera quality is amazing 
and the battery life lasts all day. However, the screen 
resolution could be better. Overall, it's a good purchase.
"""

# Prompt to extract product and brand
prompt_extraction = f"""
Identify the following items from the review text: 
- Item purchased by reviewer
- Brand of the item

The review is delimited with triple backticks.
Format your response as a JSON object with 
"Item" and "Brand" as the keys.
If the information isn't present, use "unknown" 
as the value.

Review text: '''{phone_review}'''
"""
response_extraction = get_completion(prompt_extraction)
print(response_extraction)


{
  "Item": "phone",
  "Brand": "unknown"
}


#### Step 2: Combining Multiple Tasks

In [22]:
# Prompt to combine multiple tasks
prompt_combined = f"""
Identify the following items from the review text: 
- Sentiment (positive or negative)
- Is the reviewer expressing disappointment? (true or false)
- Item purchased by reviewer
- Brand of the item

The review is delimited with triple backticks.
Format your response as a JSON object with 
"Sentiment", "Disappointment", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" 
as the value.
Make your response as short as possible.
Format the Disappointment value as a boolean.

Review text: '''{phone_review}'''
"""
response_combined = get_completion(prompt_combined)
print(response_combined)


{
    "Sentiment": "positive",
    "Disappointment": false,
    "Item": "phone",
    "Brand": "unknown"
}


# **Report on Inferring Sentiment and Topics**

### Introduction
In this report, we explore the capabilities of OpenAI's GPT-3.5 Turbo model in inferring sentiment and topics from various texts, specifically product reviews and news articles. We evaluated the model's performance in identifying emotions, sentiments, and topics, as well as its ability to extract specific details from given texts. The aim was to assess the model's accuracy, reliability, and any notable shortcomings in its responses.

### Methodology
We employed a series of prompts designed to test different aspects of text inference. Using Python and the OpenAI API, we issued structured prompts to the model and analyzed its responses. The tasks included:
1. Identifying the sentiment (positive or negative) of a product review.
2. Listing emotions expressed in a product review.
3. Determining if a review expressed anger.
4. Extracting specific details such as the item purchased and the brand from a review.
5. Inferring multiple pieces of information from a review simultaneously.
6. Identifying topics discussed in a news article.
7. Detecting the presence of specific topics within a text.

### Findings

#### Sentiment Analysis
The model accurately identified the sentiment of the product review as positive. When asked to provide the sentiment in a single word or as a detailed explanation, it consistently delivered correct and concise responses. This shows the model's strength in understanding the overall tone of the text.

#### Emotion Identification
When tasked with listing the emotions expressed in the product review, the model provided relevant and appropriate responses such as "happy, satisfied, grateful, impressed, content". This indicates a good grasp of the underlying emotional context.

#### Anger Detection
The model correctly identified that the review did not express anger. This binary decision task was straightforward for the model, showcasing its capability to discern specific emotional states.

#### Extracting Product and Brand Information
The model effectively extracted the item purchased ("lamp") and the brand ("Lumina") from the review. This demonstrates its ability to identify and summarize key details from descriptive text.

#### Multi-task Inference
In a more complex prompt where multiple details were requested simultaneously, the model provided a well-structured JSON response:
```json
{
    "Sentiment": "positive",
    "Anger": false,
    "Item": "lamp",
    "Brand": "Lumina"
}
```
This response was accurate and formatted correctly, showing the model's proficiency in handling combined tasks.

#### Topic Inference from News Articles
The model successfully identified relevant topics from a news article, providing a list including "Government survey, Employee satisfaction, NASA, Social Security Administration, Job satisfaction". This suggests a good understanding of the main themes in longer texts.

#### Topic Detection in Targeted Lists
When asked to determine if specific topics were present in the text, the model provided the following output:
```json
{
    "nasa": 1,
    "local government": 0,
    "engineering": 0,
    "employee satisfaction": 1,
    "federal government": 1
}
```
This was accurate and aligned with the content of the article, indicating the model's precision in matching topics to predefined categories.

### Conclusion
The GPT-3.5 Turbo model demonstrates a high level of competency in inferring sentiment and identifying topics from product reviews and news articles. It reliably interprets the emotional tone, extracts relevant details, and identifies core themes within texts. However, one area where improvement is needed is the handling of more complex or ambiguous prompts, as it may occasionally return unexpected formats or require further parsing. Despite these minor issues, the model's performance is robust and highly effective for practical applications in sentiment analysis and topic detection.

# **What have I learned:**

Through testing various text inference tasks with GPT-3.5 Turbo, I learned the following:

1. **Sentiment Analysis**: The model reliably identifies the sentiment of a text, accurately discerning between positive and negative tones.

2. **Emotion Detection**: It effectively extracts a list of emotions from text, capturing nuanced feelings expressed by the writer.

3. **Anger Recognition**: The model can correctly determine the presence or absence of anger in text, providing accurate binary responses.

4. **Detail Extraction**: It successfully identifies specific information, such as product and company names, demonstrating strong detail-oriented understanding.

5. **Handling Combined Tasks**: GPT-3.5 Turbo can manage multi-faceted prompts, delivering structured and concise outputs that integrate multiple pieces of information.

6. **Topic Identification**: It is proficient in recognizing and listing topics from lengthy articles, summarizing key themes effectively.

7. **Targeted Topic Detection**: The model accurately matches text content with predefined topics, though parsing the output may sometimes require careful handling.

Overall, GPT-3.5 Turbo shows strong capabilities in sentiment analysis and topic inference, making it a valuable tool for processing and understanding textual data.