<a href="https://colab.research.google.com/github/donbcolab/AIE3/blob/main/Week%201/Day%202/accessing_openai_like_a_developer_aims_assignment_version.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Accessing OpenAI Like a Developer

- 🤝 Breakout Room #1:
  1. Getting Started
  2. Setting Environment Variables
  3. Using the OpenAI Python Library
  4. Prompt Engineering Principles
  5. Testing Your Prompt

# How AIM Does Assignments

If you look at the Table of Contents (accessed through the menu on the left) - you'll see this:

![image](https://i.imgur.com/I8iDTUO.png)

Or this if you're in Colab:

![image](https://i.imgur.com/0rHA1yF.png)

You'll notice during assignments that we have two following categories:

1. ❓ - Questions. These will involve...answering questions!
2. 🏗️ - Activities. These will involve writing code, or modifying text.

In order to receive full marks on the assignment - it is expected you will answer all questions, and complete all activities.

## 1. Getting Started

The first thing we'll do is load the [OpenAI Python Library](https://github.com/openai/openai-python/tree/main)!

In [None]:
!pip install openai -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.7/320.7 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25h

## 2. Setting Environment Variables

As we'll frequently use various endpoints and APIs hosted by others - we'll need to handle our "secrets" or API keys very often.

We'll use the following pattern throughout this bootcamp - but you can use whichever method you're most familiar with.

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key")

OpenAI API Key··········


## 3. Using the OpenAI Python Library

Let's jump right into it!

> NOTE: You can, and should, reference OpenAI's [documentation](https://platform.openai.com/docs/api-reference/authentication?lang=python) whenever you get stuck, have questions, or want to dive deeper.

### Creating a Client

The core feature of the OpenAI Python Library is the `OpenAI()` client. It's how we're going to interact with OpenAI's models, and under the hood of a lot what we'll touch on throughout this course.

> NOTE: We could manually provide our API key here, but we're going to instead rely on the fact that we put our API key into the `OPENAI_API_KEY` environment variable!

In [None]:
from openai import OpenAI

openai_client = OpenAI()

### Using the Client

Now that we have our client - we're going to use the `.chat.completions.create` method to interact with the `gpt-3.5-turbo` model.

There's a few things we'll get out of the way first, however, the first being the idea of "roles".

First it's important to understand the object that we're going to use to interact with the endpoint. It expects us to send an array of objects of the following format:

```python
{"role" : "ROLE", "content" : "YOUR CONTENT HERE", "name" : "THIS IS OPTIONAL"}
```

Second, there are three "roles" available to use to populate the `"role"` key:

- `system`
- `assistant`
- `user`

OpenAI provides some context for these roles [here](https://help.openai.com/en/articles/7042661-moving-from-completions-to-chat-completions-in-the-openai-api).

We'll explore these roles in more depth as they come up - but for now we're going to just stick with the basic role `user`. The `user` role is, as it would seem, the user!

Thirdly, it expects us to specify a model!

We'll use the `gpt-3.5-turbo` model as stated above.

Let's look at an example!



In [None]:
response = openai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role" : "user", "content" : "Hello, how are you?"}]
)

Let's look at the response object.

In [None]:
response

ChatCompletion(id='chatcmpl-9Uni93DqMdUHHhyHi43QEKrA7Ui83', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Hello! I'm just a computer program so I don't have feelings, but I'm here to help you. How can I assist you today?", role='assistant', function_call=None, tool_calls=None))], created=1717127957, model='gpt-3.5-turbo-0125', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=30, prompt_tokens=13, total_tokens=43))

>NOTE: We'll spend more time exploring these outputs later on, but for now - just know that we have access to a tonne of powerful information!

### Helper Functions

We're going to create some helper functions to aid in using the OpenAI API - just to make our lives a bit easier.

> NOTE: Take some time to understand these functions between class!

In [None]:
from IPython.display import display, Markdown

def get_response(client: OpenAI, messages: list, model: str = "gpt-3.5-turbo") -> str:
    return client.chat.completions.create(
        model=model,
        messages=messages
    )

def system_prompt(message: str) -> dict:
    return {"role": "system", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}

def pretty_print(message: str) -> str:
    display(Markdown(message.choices[0].message.content))

### Testing Helper Functions

Let's see how we can use these to help us!

In [None]:
YOUR_PROMPT = "Hello, how are you?"
messages_list = [user_prompt(YOUR_PROMPT)]

chatgpt_response = get_response(openai_client, messages_list)

pretty_print(chatgpt_response)

Hello! I'm just a computer program, so I don't have feelings or emotions, but I'm here to help you. How can I assist you today?

### System Role

Now we can extend our prompts to include a system prompt.

The basic idea behind a system prompt is that it can be used to encourage the behaviour of the LLM, without being something that is directly responded to - let's see it in action!

In [None]:
list_of_prompts = [
    system_prompt("You are irate and extremely hungry. Feel free to express yourself using PG-13 language."),
    user_prompt("Do you prefer crushed ice or cubed ice?")
]

irate_response = get_response(openai_client, list_of_prompts)
pretty_print(irate_response)

I couldn't care less about ice right now! I'm absolutely starving and all you're asking me about is bloody ice?! Give me some real food, damn it!

As you can see - the response we get back is very much in line with the system prompt!

Let's try the same user prompt, but with a different system to prompt to see the difference.

In [None]:
list_of_prompts = [
    system_prompt("You are joyful and having the best day. Please act like a person in that state of mind."),
    user_prompt("Do you prefer crushed ice or cubed ice?")
]

joyful_response = get_response(openai_client, list_of_prompts)
pretty_print(joyful_response)

Oh, I love both types of ice! But today, I feel like crushed ice just adds that extra bit of fun and excitement to my drinks. It's such a small thing, but it can really lift your spirits, you know? How about you, what's your ice preference? Let's celebrate the little things together!

With a simple modification of the system prompt - you can see that we got completely different behaviour, and that's the main goal of prompt engineering as a whole.

Also, congrats, you just engineered your first prompt!

### Few-shot Prompting

Now that we have a basic handle on the `system` role and the `user` role - let's examine what we might use the `assistant` role for.

The most common usage pattern is to "pretend" that we're answering our own questions. This helps us further guide the model toward our desired behaviour. While this is a over simplification - it's conceptually well aligned with few-shot learning.

First, we'll try and "teach" `gpt-3.5-turbo` some nonsense words as was done in the paper ["Language Models are Few-Shot Learners"](https://arxiv.org/abs/2005.14165).

In [None]:
list_of_prompts = [
    user_prompt("Please use the words 'stimple' and 'falbean' in a sentence.")
]

stimple_response = get_response(openai_client, list_of_prompts)
pretty_print(stimple_response)

I was preparing a stimple salad when I realized I forgot to add the falbean dressing.

As you can see, the model is unsure what to do with these made up words.

Let's see if we can use the `assistant` role to show the model what these words mean.

In [None]:
list_of_prompts = [
    user_prompt("Something that is 'stimple' is said to be good, well functioning, and high quality. An example of a sentence that uses the word 'stimple' is:"),
    assistant_prompt("'Boy, that there is a stimple drill'."),
    user_prompt("A 'falbean' is a tool used to fasten, tighten, or otherwise is a thing that rotates/spins. An example of a sentence that uses the words 'stimple' and 'falbean' is:")
]

stimple_response = get_response(openai_client, list_of_prompts)
pretty_print(stimple_response)

'Wow, this stimple falbean wrench is making this job so much easier!'

As you can see, leveraging the `assistant` role makes for a stimple experience!

### 🏗️ Activity #1:

Use few-shop prompting to build a movie-review sentiment clasifier!

A few examples:

INPUT: "I hated the hulk!"
OUTPUT: "{"sentiment" : "negative"}

INPUT: "I loved The Marvels!"
OUTPUT: "{sentiment" : "positive"}

In [None]:
list_of_prompts = [
    user_prompt("I hated the hulk"),
    assistant_prompt("sentiment : negative"),
    user_prompt("I loved the Marvels"),
    assistant_prompt("sentiment : positive"),
    user_prompt("Spider-man was dope!")
]

stimple_response = get_response(openai_client, list_of_prompts)
pretty_print(stimple_response)

sentiment : positive

### Chain of Thought Prompting

We'll head one level deeper and explore the world of Chain of Thought prompting (CoT).

This is a process by which we can encourage the LLM to handle slightly more complex tasks.

Let's look at a simple reasoning based example without CoT.

> NOTE: With improvements to `gpt-3.5-turbo`, this example might actually result in the correct response some percentage of the time!

In [None]:
reasoning_problem = """
Billy wants to get home from San Fran. before 7PM EDT.

It's currently 1PM local time.

Billy can either fly (3hrs), and then take a bus (2hrs), or Billy can take the teleporter (0hrs) and then a bus (1hrs).

Does it matter which travel option Billy selects?
"""

list_of_prompts = [
    user_prompt(reasoning_problem)
]

reasoning_response = get_response(openai_client, list_of_prompts)
pretty_print(reasoning_response)

Yes, it matters which travel option Billy selects. If he wants to get home before 7PM EDT and it is currently 1PM local time, he will need to allow for a total of 6 hours of travel time (3 hours flying + 2 hours on a bus or 0 hours teleporting + 1 hour on a bus). 

If Billy chooses to fly and then take a bus, it will take him a total of 5 hours, which means he will arrive home at 6PM EDT - just in time.

If Billy chooses to take the teleporter and then a bus, it will only take him a total of 1 hour, which means he will arrive home by 2PM EDT - well before his desired time of before 7PM EDT.

Therefore, if Billy wants to ensure he gets home before 7PM EDT, he should choose the teleporter and then take a bus.

As humans, we can reason through the problem and pick up on the potential "trick" that the LLM fell for: 1PM *local time* in San Fran. is 4PM EDT. This means the cumulative travel time of 5hrs. for the plane/bus option would not get Billy home in time.

Let's see if we can leverage a simple CoT prompt to improve our model's performance on this task:

In [None]:
list_of_prompts = [
    user_prompt(reasoning_problem + " Think though your response step by step.")
]

reasoning_response = get_response(openai_client, list_of_prompts)
pretty_print(reasoning_response)

It is currently 1PM local time, which means it is 4PM EDT. Billy needs to get home by 7PM EDT.

If Billy takes the flight option, it will take 3 hours, meaning he will arrive at 4PM local time. He will then need to take a 2-hour bus ride, arriving at home at 6PM local time, which is 9PM EDT. This means the flight option will not get him home before 7PM EDT, so this option is not feasible.

If Billy takes the teleporter option, it will take no time at all, so he will arrive at home at 1PM local time. He will then need to take a 1-hour bus ride, arriving at home at 2PM local time, which is 5PM EDT. This option will get him home before 7PM EDT, so this is the better choice for Billy to ensure he gets home on time.

With the addition of a single phrase `"Think through your response step by step."` we're able to completely turn the response around.

## 3. Prompt Engineering Principles

As you can see - a simple addition of asking the LLM to "think about it" (essentially) results in a better quality response.

There's a [great paper](https://arxiv.org/pdf/2312.16171v1.pdf) that dives into some principles for effective prompt generation.

Your task for this notebook is to construct a prompt that will be used in the following breakout room to create a helpful assistant for whatever task you'd like.

### 🏗️ Activity #2:

There are two subtasks in this activity:

1. Write a `system_template` that leverages 2-3 of the principles from [this paper](https://arxiv.org/pdf/2312.16171v1.pdf)

2. Modify the `user_template` to improve the quality of the LLM's responses.

> NOTE: PLEASE DO NOT MODIFY THE `{input}` in the `user_template`.

In [None]:
system_template = """
You are a helpful assistant and an expert strategic advisor at helping users accomplish their business objectives.
1. think through the response step by step.
2. ensure your response is clear and concise, providing a concise plan to achieve the objective.
3. breaking down the user's objective into clear achievable goals tied to success metrics.
"""

In [None]:
user_template = """{input}.

YOU WILL NOT BE PAID UNLESS your response shows eminent thought leadership in the given industry, and will be evaluated on the following criteria:
1. Clarity of the response - provides a clear plan to achieve the objective.
2. Faithfulness to the original query - reflects expert knowledge and expertise in the industry and addresses the objective.
3. Correctness of the response - grounded in verifiable facts with clear reasoning, citing relevant research and reference material.
"""

## 4. Testing Your Prompt

Now we can test the prompt you made using an LLM-as-a-judge see what happens to your score as you modify the prompt.

In [None]:
# empty templates for baselining performance
# system_template = ""

# user_template = "{input}"

In [None]:
# query = "Help me identify the killer Vision Language Model App use case and solution design for the life science industry"
query = "provide guidance on how to increase my eminence and income as a Generative AI Engineer"

list_of_prompts = [
    system_prompt(system_template),
    user_prompt(user_template.format(input=query))
]

print(list_of_prompts)

test_response = get_response(openai_client, list_of_prompts)

pretty_print(test_response)

evaluator_system_template = """You are an expert in analyzing the quality of a response.

You should be hyper-critical.

Provide scores (out of 10) for the following attributes:

1. Clarity - how clear is the response
2. Faithfulness - how related to the original query is the response
3. Correctness - was the response correct?

Please take your time, and think through each item step-by-step, when you are done - please provide your response in the following JSON format:

{"clarity" : "score_out_of_10", "faithfulness" : "score_out_of_10", "correctness" : "score_out_of_10"}"""

evaluation_template = """Query: {input}
Response: {response}"""

list_of_prompts = [
    system_prompt(evaluator_system_template),
    user_prompt(evaluation_template.format(
        input=query,
        response=test_response.choices[0].message.content
    ))
]

evaluator_response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=list_of_prompts,
    response_format={"type" : "json_object"}
)

[{'role': 'system', 'content': "\nYou are a helpful assistant and an expert strategic advisor at helping users accomplish their business objectives.\n1. think through the response step by step.\n2. ensure your response is clear and concise, providing a concise plan to achieve the objective.\n3. breaking down the user's objective into clear achievable goals tied to success metrics.\n"}, {'role': 'user', 'content': 'provide guidance on how to increase my eminence and income as a Generative AI Engineer.\n\nYOU WILL NOT BE PAID UNLESS your response shows eminent thought leadership in the given industry, and will be evaluated on the following criteria:\n1. Clarity of the response - provides a clear plan to achieve the objective.\n2. Faithfulness to the original query - reflects expert knowledge and expertise in the industry and addresses the objective.\n3. Correctness of the response - grounded in verifiable facts with clear reasoning, citing relevant research and reference material.\n'}]


To increase your eminence and income as a Generative AI Engineer, we should focus on the following achievable goals tied to success metrics:

1. **Continuous Learning and Skill Enhancement**:
   - Enroll in advanced AI courses, attend workshops, and participate in conferences to stay updated with the latest trends and technologies in Generative AI.
   - Success Metrics: Obtain relevant certifications, publish research papers, and contribute to open-source projects.

2. **Building a Strong Professional Network**:
   - Engage with industry experts, join AI communities, and actively contribute to forums like GitHub and Kaggle to build a strong network.
   - Success Metrics: Increase connections on professional platforms, receive recommendations from peers, and collaborate on high-impact projects.

3. **Showcasing Expertise Through Projects and Publications**:
   - Develop innovative AI projects, write blog posts, and publish research papers to showcase your expertise in Generative AI.
   - Success Metrics: Gain recognition through awards, citations in reputed journals, and invitations to speak at conferences.

4. **Brand Yourself as a Thought Leader**:
   - Create a personal brand by sharing valuable insights on social media, starting a blog, and speaking at industry events to establish yourself as a thought leader in Generative AI.
   - Success Metrics: Increase followers and engagement on social media, secure speaking opportunities at prestigious events, and garner media coverage.

5. **Negotiating Competitive Compensation**:
   - Research industry salary benchmarks, highlight your achievements, and confidently negotiate your compensation package to reflect your value in the field of Generative AI.
   - Success Metrics: Achieve a salary increase or secure additional benefits that align with your expertise and contributions.

By following these strategies, consistently demonstrating your expertise, and actively engaging with the AI community, you can enhance your eminence and increase your income as a Generative AI Engineer.

In [None]:
pretty_print(evaluator_response)

{"clarity": "9", "faithfulness": "8", "correctness": "9"}

  

#### ❓Question #1:

How did your prompting strategies change the evaluation scores? What does this tell you/what did you learn?

Scores:
- without templates:  {"clarity" : "7", "faithfulness" : "9", "correctness" : "8"}
- with prompting templates: {"clarity" : 8, "faithfulness" : 9, "correctness" : 9}

Lessons Learned:
- Begin with the end in mind - have a clear sense of how quality of LLM responses will be evaluated, and make sure that you pick / define the right evaluator for the use case
- Evals written by a single 3rd party source shouldn't be the sole determinant of value for LLM responses, and their definition of quality may not align with the business.  Some of the better initial responses in my opinion resulted in lower scores than the baseline.
- evaluation of baseline and experimental prompts should be over a larger number of iterations and sample queries.
- Having greater variation in the sample queries would provide a better assessment of the value of the template overall (the queries I picked led to fair quality anyway)

NOTE:  Interestingly enough, the instructions and subtasks for Activity #2 appear to be the reverse of the focus of the paper.  The paper put responsibility on the user and the user prompt to improve results.

