# 추론
이번 강의에서는 상품 리뷰에 대한 감성, 그리고 뉴스 기사의 주제를 추론해봅시다!

## Setup

In [None]:
import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.getenv('OPENAI_API_KEY')

In [None]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

## 상품 리뷰 텍스트

지난 시간에 봤던 리뷰와 굉장히 비슷하네요

In [None]:
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

## 감성 (긍정/부정)

<span style=color:red>**이 리뷰가 상품에 대해 긍정적인 것인지 부정적인 것인지 구분하는 이진 분류 문제입니다.**</span>  
만약 이것의 성능이 좋다면 기존의 BERT 기반의 모델을 사용하지 않아도 되겠죠?!

In [None]:
prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

In [None]:
prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Give your answer as a single word, either "positive" \
or "negative".

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

## 감정을 파악해보세요

리뷰 작성자의 감정을 최대 다섯개까지 리스트로 만들라고 지시해봅니다.  
리뷰의 길이에 따라 적당한 개수로 조정해보면 좋겠네요!

In [None]:
prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

## 화가 났는지 확인해보세요

작성된 리뷰에 '화'라는 감정이 포함되어 있는지를 확인하는 이진 분류 지시를 해보세요

In [None]:
prompt = f"""
Is the writer of the following review expressing anger?\
The review is delimited with triple backticks. \
Give your answer as either yes or no.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

## 고객 리뷰로부터 상품명과 회사명을 추출해보세요

모델이 추출해야할 정보를 <u>리스트로 명료하게 구분</u>하고 있습니다.  
또한 <u>답변의 형태를 JSON으로 지정함과 동시에 원하는 정보가 포함되어 있지 않을 때의 상황도 적절히 가정</u>했습니다.  
추출할 정보가 많을수록 이러한 형태로 프롬프트를 작성하는 것이 도움이 되겠죠?  

In [None]:
prompt = f"""
Identify the following items from the review text: 
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Item" and "Brand" as the keys. 
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
  
Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

## 여러 개 명령을 한 번에 실행해보세요

추출할 정보의 개수도 늘었고, 답변 형식에 대한 디테일이 추가되었네요.  
참/거짓을 나타내는 Boolean 형태로 답변을 받을 수도 있군요!

In [None]:
prompt = f"""
Identify the following items from the review text: 
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

## 주제 예측하기

뉴스 기사가 주어집니다.

In [None]:
story = """
In a recent survey conducted by the government, 
public sector employees were asked to rate their level 
of satisfaction with the department they work at. 
The results revealed that NASA was the most popular 
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings, 
stating, "I'm not surprised that NASA came out on top. 
It's a great place to work with amazing people and 
incredible opportunities. I'm proud to be a part of 
such an innovative organization."

The results were also welcomed by NASA's management team, 
with Director Tom Johnson stating, "We are thrilled to 
hear that our employees are satisfied with their work at NASA. 
We have a talented and dedicated team who work tirelessly 
to achieve our goals, and it's fantastic to see that their 
hard work is paying off."

The survey also revealed that the 
Social Security Administration had the lowest satisfaction 
rating, with only 45% of employees indicating they were 
satisfied with their job. The government has pledged to 
address the concerns raised by employees in the survey and 
work towards improving job satisfaction across all departments.
"""

## 주어진 기사에 대해서 다섯 개의 주제를 정하도록 해보세요

이것도 마찬가지로 <u>기사의 길이에 따라 추출하고자 하는 개수를 조절</u>할 수 있겠죠?  
보통 이런 작업을 적절한 어휘를 고민하고 있을 때 큰 도움이 된답니다.

In [None]:
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long. 

Format your response as a list of items separated by commas.

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

In [None]:
response.split(sep=',')

In [None]:
topic_list = [
    "nasa", "local government", "engineering", 
    "employee satisfaction", "federal government"
]

## 특정 주제가 포함된 기사인지 확인해보세요

<u>주어진 기사의 주제가 위에서 지정한 토픽 리스트에 포함되는지 확인하는 구조</u>입니다.  
이런 방식을 통해 글의 주제나 키워드 등을 빠르게 확인하고 비교할 수 있겠군요.

In [None]:
prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as list with 0 or 1 for each topic.\

List of topics: {", ".join(topic_list)}

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

In [None]:
topic_dict = {i.split(': ')[0]: int(i.split(': ')[1]) for i in response.split(sep='\n')}
if topic_dict['nasa'] == 1:
    print("ALERT: New NASA story!")

## Try experimenting on your own!