# Classification & Extraction
In this lesson, you will classify sentiment and topics from passenger reviews and news articles.

## Setup

In [None]:
import openai
import os
import dotenv

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.getenv('OPENAI_API_KEY')

In [None]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
        max_tokens=1024, #this is the max desired length of the response 

    )
    return response.choices[0].message["content"]

## Passenger feedback text

In [None]:
pax_review = """
Hi I would like to complain about the extremely rude behaviour and poor job performance by the two women in the attached pictures. \
The incident happened at 9:00am on Sunday 23 May 2015 at the basement taxi waiting area. There are two separate taxi queues in Basement 1. \
I could see a continuous stream of taxis at the first taxi queue, whereas the taxi queue #2 received only a fraction of that number resulting in a much longer waiting period, while the other queue was moving much faster. \
When I requested the two gentlemen to try and get more taxis allocated to the second queue, they rudely asked me not to bother them and that I should go to the other queue or take a Grab car if I didn?t want to wait. \
Airport has always had very intelligent staff and excellent customer service, however this was a terrible experience to return to after a long break from travel. Please do help to fix the queuing system so that taxis are distributed equally across passengers. \
Please also provide feedback to these two staff that their rude and hostile behaviour is unhelpful to passengers. \
Kind regards Mahir
"""

## Infer Sentiment (positive/negative)

In [None]:
prompt = f"""
What is the sentiment of the following passenger review, 
which is delimited with triple backticks?

Review text: '''{pax_review}'''
"""
response = get_completion(prompt)
print(response)

In [None]:
prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Give your answer as a single word, either "positive" \
or "negative".

Review text: '''{pax_review}'''
"""
response = get_completion(prompt)
print(response)

## Identify types of emotions

In [None]:
prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.

Review text: '''{pax_review}'''
"""
response = get_completion(prompt)
print(response)

## Identify anger

In [None]:
prompt = f"""
Is the writer of the following review expressing anger?\
The review is delimited with triple backticks. \
Give your answer as either yes or no.

Review text: '''{pax_review}'''
"""
response = get_completion(prompt)
print(response)

## Extract location and company name from passenger reviews

In [None]:
prompt = f"""
Identify the following items from the review text: 
- Location specified by reviewer
- Company mentioned by reviewer, if any

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Location" and "Company" as the keys. 
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
  
Review text: '''{pax_review}'''
"""
response = get_completion(prompt)
print(response)

## Doing multiple tasks at once

In [None]:
prompt = f"""
Identify the following items from the review text: 
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Location specified by reviewer
- Company mentioned by reviewer, if any

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Location" and "Company" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: '''{pax_review}'''
"""
response = get_completion(prompt)
print(response)

## Inferring topics from news articles 

In [None]:
story = """
In a recent survey conducted by the government, 
public sector employees were asked to rate their level 
of satisfaction with the department they work at. 
The results revealed that NASA was the most popular 
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings, 
stating, "I'm not surprised that NASA came out on top. 
It's a great place to work with amazing people and 
incredible opportunities. I'm proud to be a part of 
such an innovative organization."

The results were also welcomed by NASA's management team, 
with Director Tom Johnson stating, "We are thrilled to 
hear that our employees are satisfied with their work at NASA. 
We have a talented and dedicated team who work tirelessly 
to achieve our goals, and it's fantastic to see that their 
hard work is paying off."

The survey also revealed that the 
Social Security Administration had the lowest satisfaction 
rating, with only 45% of employees indicating they were 
satisfied with their job. The government has pledged to 
address the concerns raised by employees in the survey and 
work towards improving job satisfaction across all departments.
"""

## Infer 5 topics

In [None]:
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long. 

Format your response as a list of items separated by commas.

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

In [None]:
response.split(sep=',')

In [None]:
topic_list = [
    "nasa", "local government", "engineering", 
    "employee satisfaction", "federal government"
]

## Make a news alert for certain topics

In [None]:
prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as list with 0 or 1 for each topic.\

List of topics: {", ".join(topic_list)}

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

In [None]:
topic_dict = {}

# Iterate through the elements in the response list
for index, value in enumerate(response):
    key = str(index)  # Use the index as the key (0, 1, 2, ...)
    topic_dict[key] = value

if '1' in topic_dict or 'nasa' in topic_dict and topic_dict.get('nasa', 0) == 1:
    print("ALERT: New NASA story!")
else:
    print("No NASA story found.")

## Try experimenting on your own!