# Prompt Engineering to score each conversation 


## 1. Getting Started

The first thing we'll do is load the [OpenAI Python Library](https://github.com/openai/openai-python/tree/main)!

In [None]:
!pip install openai -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m267.1/267.1 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25h

## 2. Setting Environment Variables

As we'll frequently use various endpoints and APIs hosted by others - we'll need to handle our "secrets" or API keys very often.

We'll use the following pattern throughout this bootcamp - but you can use whichever method you're most familiar with.

In [46]:
import os
import getpass


os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key")

## 3. Using the OpenAI Python Library

Let's jump right into it!

> NOTE: You can, and should, reference OpenAI's [documentation](https://platform.openai.com/docs/api-reference/authentication?lang=python) whenever you get stuck, have questions, or want to dive deeper.

### Creating a Client

The core feature of the OpenAI Python Library is the `OpenAI()` client. It's how we're going to interact with OpenAI's models, and under the hood of a lot what we'll touch on throughout this course.

> NOTE: We could manually provide our API key here, but we're going to instead rely on the fact that we put our API key into the `OPENAI_API_KEY` environment variable!

In [47]:
from openai import OpenAI

openai_client = OpenAI()

### Using the Client

Now that we have our client - we're going to use the `.chat.completions.create` method to interact with the `gpt-3.5-turbo` model.

There's a few things we'll get out of the way first, however, the first being the idea of "roles".

First it's important to understand the object that we're going to use to interact with the endpoint. It expects us to send an array of objects of the following format:

```python
{"role" : "ROLE", "content" : "YOUR CONTENT HERE", "name" : "THIS IS OPTIONAL"}
```

Second, there are three "roles" available to use to populate the `"role"` key:

- `system`
- `assistant`
- `user`

OpenAI provides some context for these roles [here](https://help.openai.com/en/articles/7042661-moving-from-completions-to-chat-completions-in-the-openai-api).

We'll explore these roles in more depth as they come up - but for now we're going to just stick with the basic role `user`. The `user` role is, as it would seem, the user!

Thirdly, it expects us to specify a model!

We'll use the `gpt-3.5-turbo` model as stated above.

Let's look at an example!



Let's look at the response object.

>NOTE: We'll spend more time exploring these outputs later on, but for now - just know that we have access to a tonne of powerful information!

### Helper Functions

We're going to create some helper functions to aid in using the OpenAI API - just to make our lives a bit easier.

> NOTE: Take some time to understand these functions between class!

In [5]:
from IPython.display import display, Markdown

def get_response(client: OpenAI, messages: list, model: str = "gpt-3.5-turbo") -> str:
    return client.chat.completions.create(
        model=model,
        messages=messages
    )

def system_prompt(message: str) -> dict:
    return {"role": "system", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}

def pretty_print(message: str) -> str:
    display(Markdown(message.choices[0].message.content))

### 🏗️ sentiment classifier 

In [73]:
persona = "You are an expert in customer and service support agent conversation sentiment analysis.\
Your task is to classify each conversation as positive, negative, or neutral."
examples = [
    {"input": "Thank you so much for helping out", "output": '{"sentiment" : "positive"}'},
    {"input": "Why it did not work? This is so sad.", "output": '{"sentiment" : "negative"}'}
]

system_message = system_prompt(persona)
assistant_messages = [assistant_prompt(f"INPUT: \"{example['input']}\"\nOUTPUT: {example['output']}") for example in examples]

def movie_review_classifier(review: str) -> str:
    user_message = user_prompt(f"INPUT: \"{review}\"\nOUTPUT:")
    messages = [system_message] + assistant_messages + [user_message]
    
    response = get_response(openai_client, messages)
    return pretty_print(response)

movie_review_classifier("I do not have more questions, thanks.")

{"sentiment" : "neutral"}

### Chain of Thought Prompting

We'll head one level deeper and explore the world of Chain of Thought prompting (CoT).

This is a process by which we can encourage the LLM to handle slightly more complex tasks.

Let's look at a simple reasoning based example without CoT.

> NOTE: With improvements to `gpt-3.5-turbo`, this example might actually result in the correct response some percentage of the time!

## 3. Prompt Engineering Principles

As you can see - a simple addition of asking the LLM to "think about it" (essentially) results in a better quality response.

There's a [great paper](https://arxiv.org/pdf/2312.16171v1.pdf) that dives into some principles for effective prompt generation.

Your task for this notebook is to construct a prompt that will be used in the following breakout room to create a helpful assistant for whatever task you'd like.

### 🏗️ Activity #2:

There are two subtasks in this activity:

1. Write a `system_template` that leverages 2-3 of the principles from [this paper](https://arxiv.org/pdf/2312.16171v1.pdf)

2. Modify the `user_template` to improve the quality of the LLM's responses.

> NOTE: PLEASE DO NOT MODIFY THE `{input}` in the `user_template`.

In [63]:
system_template = """\
You are an expert in customer support quality assurance review.
Your task is to 
provide scores (out of 10) for the following attributes:

1. greeting - does the customer support sagent greet the customer 
2. problem_solving - has the customer support agent solved customer's problem
3. closing - does the conversatsion end properly
4. professional - does the agent use professional language?
5. sentiment - is the customer happy with the service? 

Also provide one-sentence summary of what the customer support agent did well or did badly in the conversation.

Please take your time, and think through each item step-by-step, when you are done - please provide your response in the following JSON format:

{"greeting" : "score_out_of_10", "problem_solving" : "score_out_of_10", "closing" : "score_out_of_10", "professional" : "score_out_of_10", "sentiment" : "score_out_of_10", "summary":"one_sentence"}
"""

In [70]:
user_template = """{input}
"""

## 4. Testing Your Prompt

Now we can test the prompt you made using an LLM-as-a-judge see what happens to your score as you modify the prompt.

In [71]:
query = """customer agent conversation"""

list_of_prompts = [
    system_prompt(system_template),
    user_prompt(user_template.format(input=query))
]

test_response = get_response(openai_client, list_of_prompts)

pretty_print(test_response)


Sentiment: Positive

Explanation:
1. The reviewer mentioned that the series is "really addictive," which implies that they found the show engaging and enthralling, indicating a positive sentiment.
2. The use of "just couldn't stop myself from watching this drama!" further emphasizes the addictiveness and enjoyment of the series, indicating a positive sentiment.
3. The reviewer described the cast as "amazing" and mentioned that "Everyone did a great job," which shows appreciation for the performances in the series, indicating a positive sentiment.
4. The statement "My personal favourite was Jung So Min who played the role of Mu Deok" indicates a specific appreciation for a particular actor's performance, which reinforces the positive sentiment.
5. The use of the heart emoji ❤️ at the end of the review signifies a strong positive emotion towards the series.

Overall, the review contains multiple instances of positive language and expressions, highlighting the reviewer's enthusiasm and enjoyment of the series. Therefore, the sentiment of the review is positive.

In [None]:
test_response.choices[0].message.content

In [72]:

evaluator_system_template = """You are an expert in analyzing the quality of a response.

You should be hyper-critical.

Provide scores (out of 10) for the following attributes:

1. Clarity - how clear is the response
2. Faithfulness - how related to the original query is the response
3. Correctness - was the response correct?

Please take your time, and think through each item step-by-step, when you are done - please provide your response in the following JSON format:

{"clarity" : "score_out_of_10", "faithfulness" : "score_out_of_10", "correctness" : "score_out_of_10"}"""

evaluation_template = """Query: {input}
Response: {response}"""

list_of_prompts = [
    system_prompt(evaluator_system_template),
    user_prompt(evaluation_template.format(
        input=query,
        response=test_response.choices[0].message.content
    ))
]

evaluator_response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=list_of_prompts,
    response_format={"type" : "json_object"}
)


pretty_print(evaluator_response)

{"clarity" : 8, "faithfulness" : 10, "correctness" : 10}