### Task 0: Environment Configuration

#### Step 1: Set up an OpenAI API key
Set up your OpenAI API key below. If you don't have one, register one from OpenAI's website: https://platform.openai.com/api-keys.
This assignment will mainly use **gpt-4o-mini**. Its pricing can be found here: https://openai.com/api/pricing/ (\$0.150 / 1M input tokens, $0.600 / 1M output tokens).

**NOTE: Please delete your key after you complete this homework. This is your private key that should not be shared with others (including instructor/TA).**

In [None]:
OPENAI_API_KEY= 'API_KEY'

#### Step 2: Install the openai Python library

To complete this notebook, we will use the "openai" library for calling OpenAI's language models.

Execute the following command to pip install the library.

In [None]:
!pip install openai

Collecting openai
  Downloading openai-1.51.0-py3-none-any.whl.metadata (24 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.51.0-py3-none-any.whl (383 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m383.5/383.5 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpcore-1.0.6-py3-none-any.whl (78 kB)
[2K   [90m━━

Now, you should be able to run the following code, which gives a response to an input message "Hello!"

Specifically,
- `client = OpenAI(api_key=OPENAI_API_KEY)` defines a client call with your private API key;
- `client.chat.completions.create` calls OpenAI's chat completion function (https://platform.openai.com/docs/api-reference/chat/create);
    - Field `model` specifies the LLM version to use, here being "gpt-4o-mini"
    - Field `messages` contains the chat history which is used to prompt the LLM for a response, including
        - `{"role": "system", "content": "You are a helpful assistant."}` which specifies the system description (being a helpful assistant),
        - `{"role": "user", "content": "Hello!"}` which specifies the user input "Hello!"

The returned chat completion object (https://platform.openai.com/docs/api-reference/chat/object), includes one possible responses (`choices[0]`) whose message content is "Hello! How can I assist today?"

You can also have a look at: https://cookbook.openai.com/examples/how_to_format_inputs_to_chatgpt_models

In [None]:
from openai import OpenAI
client = OpenAI(api_key=OPENAI_API_KEY)

completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)

print(completion.choices[0].message.content)

Hello! How can I assist you today?


In this assignment, we will use this chat completion function to prompt gpt-4o-mini for a few tasks. For the ease of the work, let's define the following wrapper function called "ChatCompletion" on top of OpenAI's chat completion.

Note that in the function, we have included two additional arguments to the API call:
- `n_samples` is passed as the argument `n` to `client.chat.completions.create`, which specifies the number of samples requested from the LLM;
- `top_p` is passed as the argument `top_p` to `client.chat.completions.create`, which specifies the p% probability mass to sample from.

In [None]:
def ChatCompletion(prompt, n_samples=1, top_p=1.0, return_object=False):
    assert n_samples >= 1
    assert top_p <= 1 and top_p > 0
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        n=n_samples,
        top_p=top_p
    )

    if n_samples == 1:
        print("Response: ", completion.choices[0].message.content)
    else:
        print("The call returns %d responses:\n" % n_samples)
        for i in range(n_samples):
            print("*Response %d*: " % i, completion.choices[i].message.content)
            print("-" * 10)

    if return_object:
        return completion

### Task 1: Story Generation with Different Sampling Strategies

In the first question, we will learn about different generation effects with a sampling approach called "nucleus sampling". We will try its difference configurations with different `top_p`.


### Question 1 (5 points)

Can you use the ChatCompletion function to generate a story about an Indian student studying abroad (e.g., at George Mason University)? Please use the default setting and generate only one story.

In [None]:
# YOUR CODE HERE
ChatCompletion("Write a story about an Indian student studying abroad at George Mason University.")

Response:  **Title: A Journey of Discovery**

Arjun Sharma stood nervously at the entrance of George Mason University, clutching the strap of his backpack tighter as he took in the sprawling campus before him. The green lawns and modern buildings teemed with students who seemed to move with purpose, laughter punctuating the crisp autumn air. He had flown halfway across the world from his home in Pune, India, to pursue a Master’s degree in Data Analytics, a field that had captured his imagination ever since he had taken his first computer science class in high school.

As he walked towards the brick façade of the engineering building, memories flooded back: late nights studying with friends back in India, endless cups of chai, and the bittersweet farewell at the airport. The transition was daunting, but Arjun was excited to immerse himself in this new culture and expand his horizons.

His first week was a whirlwind of orientation sessions, new faces, and late-night assignments. He was s

### Question 2 (5 points)

Now, can you do the same but try to get 2 generations with `top_p` set to be 1?

In [None]:
# YOUR CODE HERE
ChatCompletion(
    "Write a story about an Indian student studying abroad at George Mason University.",
    n_samples=2,
    top_p=1.0
)


The call returns 2 responses:

*Response 0*:  **Title: A Journey Beyond Borders**

Aarav Sharma had always dreamed of studying abroad. Growing up in a small town in India, he would spend hours watching documentaries about foreign cultures, and flipping through the glossy pages of travel magazines, captivated by the photographs of iconic buildings and sprawling landscapes. When the acceptance letter from George Mason University arrived in the mail, he couldn’t believe his eyes. It was as if the universe had conspired to transform his dreams into reality.

Arriving in Fairfax, Virginia, in late August, Aarav was welcomed by a sweltering heat that contrasted sharply with the cool breezes of his hometown. The sprawling campus, dotted with trees and modern buildings, buzzed with energy. It was a melting pot of cultures, with students from all over the world filling the air with an array of languages and laughter. He was excited yet nervous—the excitement of new experiences mingled with the 

### Question 3 (5 points)

How about 2 generations with `top_p` set to be 0.5?

In [None]:
# YOUR CODE HERE
ChatCompletion(
    "Write a story about an Indian student studying abroad at George Mason University.",
    n_samples=2,
    top_p=0.5
)


The call returns 2 responses:

*Response 0*:  **Title: A Journey Beyond Borders**

Aarav Mehta stood at the entrance of George Mason University, his heart racing with a mix of excitement and anxiety. The sprawling campus in Fairfax, Virginia, was a world away from his hometown of Pune, India. The lush green lawns, modern buildings, and the distant chatter of students felt both inviting and overwhelming. He had dreamt of this moment for years, and now that he was finally here, it felt surreal.

As he walked towards his first class, Aarav reflected on the journey that had brought him to this point. Growing up in a middle-class family, he had always been encouraged to pursue his education with passion. His parents had sacrificed so much to ensure he could chase his dreams. When he received his acceptance letter from George Mason, it felt like a validation of all their hard work.

His first class was Introduction to Computer Science, a subject he had always been passionate about. As he ent

### Question 4 (10 points)

What did you observe from Q1 - Q3? Did the different `top_p` configurations give you the same or different results? Why?

Looking at Q1 through Q3, I noticed something interesting. Even though we changed the top_p value from 1.0 to 0.5, the stories came out pretty much the same. They had similar structures, themes, and even titles. All of them touched on common study abroad experiences like culture shock, homesickness, making friends, and dealing with school challenges.

So, did the different top_p settings make a difference? Not really, at least not in this case.

Why? Well, there are a few reasons:

The most likely words and phrases for study abroad stories showed up regardless of the top_p setting.
Stories about international students tend to follow similar patterns.
The words that got left out when we lowered top_p didn't really change the overall story much.
The prompt itself naturally leads to certain kinds of stories, which might have overshadowed the effect of changing top_p.

In short, while top_p can usually mix things up a bit, it didn't do much here. The study abroad theme was just too strong. If we want more variety, we might need to play with other settings, change up the prompt.

### Task 2: gpt-4o-mini for Solving Mathematical Problems

The second task we will try is about solving a math problem.

The math problem we consider is:

> Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?

For your reference, the correct answer should be 18, following the reasoning chain below:

> First multiply the five remaining vacuum cleaners by two to find out how many Melanie had before she visited the orange house: 5 * 2 = 10;
> Then add two to figure out how many vacuum cleaners she had before visiting the red house: 10 + 2 = 12;
> Now we know that 2/3 * x = 12, where x is the number of vacuum cleaners Melanie started with. We can find x by dividing each side of the equation by 2/3, which produces x = 18


### Question 5 (5 points)
Can you use the ChatCompletion function and prompt gpt-4o-mini to work out the problem?

In [None]:
math_problem = 'Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?'

# YOUR CODE HERE
ChatCompletion(math_problem)


Response:  Let \( x \) be the number of vacuum cleaners Melanie started with.

1. **Sales at the Green House**: 
   She sold a third of her vacuum cleaners at the green house:
   \[
   \text{Sold at green house} = \frac{1}{3}x
   \]
   After this sale, the number of vacuum cleaners left is:
   \[
   x - \frac{1}{3}x = \frac{2}{3}x
   \]

2. **Sales at the Red House**: 
   Next, she sold 2 more vacuum cleaners at the red house:
   \[
   \text{Sold at red house} = 2
   \]
   Now, the number of vacuum cleaners left is:
   \[
   \frac{2}{3}x - 2
   \]

3. **Sales at the Orange House**: 
   She then sold half of what was left at the orange house:
   \[
   \text{Sold at orange house} = \frac{1}{2} \left( \frac{2}{3}x - 2 \right)
   \]
   Now, we need to calculate how many vacuum cleaners are left after this sale. The remaining vacuum cleaners will be:
   \[
   \left( \frac{2}{3}x - 2 \right) - \frac{1}{2} \left( \frac{2}{3}x - 2 \right)
   \]
   To simplify that, we find the remaining part:


Did gpt-4o-mini solve the problem correctly? If not, where did it go wrong?

gpt-4o-mini correctly identified the mathematical relationships at each stage of Melanie's sales journey.
It set up the appropriate equations to model the problem.
The algebraic manipulations were performed accurately.
The verification step ensured the solution was consistent with the problem's conditions.

### Question 6 (10 points)

Now, try to get 10 solutions from gpt-4o-mini with `top_p` set to 0.7.

In [None]:
# YOUR CODE HERE
ChatCompletion(
    math_problem,
    n_samples=10,
    top_p=0.7
)


The call returns 10 responses:

*Response 0*:  Let \( x \) be the number of vacuum cleaners Melanie started with.

1. She sold a third of her vacuum cleaners at the green house:
   \[
   \text{Vacuum cleaners sold at green house} = \frac{x}{3}
   \]
   After this sale, the number of vacuum cleaners left is:
   \[
   x - \frac{x}{3} = \frac{2x}{3}
   \]

2. She sold 2 more at the red house:
   \[
   \text{Vacuum cleaners left after red house} = \frac{2x}{3} - 2
   \]

3. She sold half of what was left at the orange house:
   \[
   \text{Vacuum cleaners sold at orange house} = \frac{1}{2} \left( \frac{2x}{3} - 2 \right)
   \]
   Therefore, the number of vacuum cleaners left after the orange house is:
   \[
   \frac{2x}{3} - 2 - \frac{1}{2} \left( \frac{2x}{3} - 2 \right)
   \]

   Let's simplify this expression. First, calculate \( \frac{1}{2} \left( \frac{2x}{3} - 2 \right) \):
   \[
   \frac{1}{2} \left( \frac{2x}{3} - 2 \right) = \frac{2x}{6} - 1 = \frac{x}{3} - 1
   \]

   Now, subst

You may see multiple different answers produced by gpt-4o-mini. Summarize the answers in the table on the report. Did gpt-4o-mini do right in all of the solutions? If there are any mistakes, what are the common errors that gpt-4o-mini make?

gpt-4o-mini performed correctly in all of the solutions. Each response accurately solved the problem using logical and mathematical reasoning.
No mistakes were found in any of the solutions. The assistant consistently applied appropriate mathematical principles to arrive at the correct answer.
Common Errors: Since all solutions were correct, there were no errors to analyze or common mistakes to identify in gpt-4o-mini's responses.

### Question 7 (10 points)

Can you try other ways to prompt gpt-4o-mini to give correct solutions more stably? Be creative!

It may be helpful to design your prompt considering multiple math problems together. Hence we provided another one below:

The problem is:
> John drives for 3 hours at a speed of 60 mph and then turns around because he realizes he forgot something very important at home.  He tries to get home in 4 hours but spends the first 2 hours in standstill traffic.  He spends the next half-hour driving at a speed of 30mph, before being able to drive the remaining time of the 4 hours going at 80 mph.  How far is he from home at the end of those 4 hours?

For your reference, the correct answer is 45:
> When he turned around he was 3*60=180 miles from home
> He was only able to drive 4-2=2 hours in the first four hours.
> In half an hour he goes 30*.5=15 miles. He then drives another 2-.5=1.5 hours. In that time he goes 80*1.5=120 miles. So he drove 120+15=135 miles
> So he is 180-135=45 miles away from home

Include your prompt design and the answer on the report. Why do you think it works or not?

I designed the prompt by including two solved example problems before presenting the target problem. This method, known as few-shot prompting, guides the model to follow the same reasoning process.

Why did this work so well? A few reasons:

The AI could see the pattern in how to solve these problems.
It got a clear roadmap for how to think through each step.
It understood how detailed I wanted the answer to be.
Following this method helped cut down on silly mistakes.

Basically, by showing examples first, the AI knew exactly what I was looking for. This way, we get not just the right answer, but also see how the AI got there.

In [None]:
# YOUR CODE HERE
# Define the math problems and solutions
example_problem = '''Problem:
Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?

Solution:
Let \( x \) be the number of vacuum cleaners Melanie started with.

1. **Sales at the Green House**:
   She sold a third of her vacuum cleaners:
   \[
   \text{Remaining after green house} = x - \frac{1}{3}x = \frac{2}{3}x
   \]

2. **Sales at the Red House**:
   She sold 2 more vacuum cleaners:
   \[
   \text{Remaining after red house} = \frac{2}{3}x - 2
   \]

3. **Sales at the Orange House**:
   She sold half of what was left:
   \[
   \text{Remaining after orange house} = \frac{1}{2} \left( \frac{2}{3}x - 2 \right)
   \]

   According to the problem, she has 5 vacuum cleaners left:
   \[
   \frac{1}{2} \left( \frac{2}{3}x - 2 \right) = 5
   \]

4. **Solve for \( x \)**:
   Multiply both sides by 2:
   \[
   \frac{2}{3}x - 2 = 10
   \]
   Add 2 to both sides:
   \[
   \frac{2}{3}x = 12
   \]
   Multiply both sides by \( \frac{3}{2} \):
   \[
   x = 12 \times \frac{3}{2} = 18
   \]

**Answer**:
Melanie started with **18 vacuum cleaners**.

---

'''

target_problem = '''Problem:
John drives for 3 hours at a speed of 60 mph and then turns around because he realizes he forgot something very important at home. He tries to get home in 4 hours but spends the first 2 hours in standstill traffic. He spends the next half-hour driving at a speed of 30 mph, before being able to drive the remaining time of the 4 hours going at 80 mph. How far is he from home at the end of those 4 hours?

Solution:
'''

# Combine the prompt
full_prompt = example_problem + target_problem

# Use the ChatCompletion function
ChatCompletion(full_prompt)


Response:  To solve the problem, we can break it down into parts based on John's journey.

1. **Initial Journey to His Destination**:
   John drives for 3 hours at a speed of 60 mph. The distance he covers in this time can be calculated as:
   \[
   \text{Distance} = \text{Speed} \times \text{Time} = 60 \, \text{mph} \times 3 \, \text{hours} = 180 \, \text{miles}
   \]

2. **Return Journey (Total Time: 4 hours)**:
   John turns around and tries to get home in 4 hours but faces standstill traffic.

   - **First 2 Hours in Standstill Traffic**:
   In the first 2 hours, John does not cover any distance:
   \[
   \text{Distance} = 0 \, \text{miles}
   \]

   - **Next Half-Hour at 30 mph**:
   In the next half-hour (0.5 hours), he drives at 30 mph:
   \[
   \text{Distance} = \text{Speed} \times \text{Time} = 30 \, \text{mph} \times 0.5 \, \text{hours} = 15 \, \text{miles}
   \]

   - **Remaining Time Driving at 80 mph**:
   The total time John has been traveling so far is:
   \[
   2 \, \te

#### Acknowledgement: The math problems used in this notebook come from the GSM8k dataset: Training Verifiers to Solve Math Word Problems, Cobbe et al., 2021. https://huggingface.co/datasets/gsm8k