## Homework 2, Part 1, CS678 Fall 2024

### This is due on **October 11th, 2024**. Please read the report PDF for submission instruction.
### **Note that this is only the Part 1 of the homework.**

#### **IMPORTANT**: After copying this notebook to a Google Drive or One Drive, please paste a link to the PDF report ("Your Notebook solution"). To get a publicly-accessible link, hit the *Share* button at the top right, then click "Get shareable link" and copy over the result. If you fail to do this, you will receive no credit for this homework!

---

##### *How to do this problem set:*

- Some questions require writing Python code and computing results, and the rest of them have written answers. For coding problems, you will have to fill out all code blocks that say `YOUR CODE HERE`.

- This assignment is designed so that you can run all cells almost instantly. If it is taking longer than that, you have made a mistake in your code.

- Note that there are more questions in the PDF than the ones present in this notebook (which only includes the ones requiring code).

---

##### *How to submit this problem set:*
- After filling in the missing code, provide all the answers in LaTeX template released with the assignment. Once again, you should create a shareable link of your completed notebook and paste it to the LaTex report. The PDF report compiled from running the LaTex template should be submitted to Gradescope.
  
---

##### *Academic honesty*

- We will audit the notebooks from a set number of students, chosen at random. The audits will check that the code you wrote actually generates the answers in your PDF. If you turn in correct answers on your PDF without code that actually generates those answers, we will consider this a serious case of cheating. See the course page for honesty policies.

- We will also run automatic checks of notebooks for plagiarism. Copying code from others is also considered a serious case of cheating.

---

### Task 0: Environment Configuration

#### Step 1: Set up an OpenAI API key
Set up your OpenAI API key below. If you don't have one, register one from OpenAI's website: https://platform.openai.com/api-keys.
This assignment will mainly use **gpt-4o-mini**. Its pricing can be found here: https://openai.com/api/pricing/ (\$0.150 / 1M input tokens, $0.600 / 1M output tokens).

**NOTE: Please delete your key after you complete this homework. This is your private key that should not be shared with others (including instructor/TA).**

In [2]:
OPENAI_API_KEY= ''

#### Step 2: Install the openai Python library

To complete this notebook, we will use the "openai" library for calling OpenAI's language models.

Execute the following command to pip install the library.

In [3]:
!pip install openai

Collecting openai
  Downloading openai-1.51.0-py3-none-any.whl.metadata (24 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.51.0-py3-none-any.whl (383 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m383.5/383.5 kB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpcore-1.0.6-py3-none-any.whl (78 kB)
[2K   [90m━

Now, you should be able to run the following code, which gives a response to an input message "Hello!"

Specifically,
- `client = OpenAI(api_key=OPENAI_API_KEY)` defines a client call with your private API key;
- `client.chat.completions.create` calls OpenAI's chat completion function (https://platform.openai.com/docs/api-reference/chat/create);
    - Field `model` specifies the LLM version to use, here being "gpt-4o-mini"
    - Field `messages` contains the chat history which is used to prompt the LLM for a response, including
        - `{"role": "system", "content": "You are a helpful assistant."}` which specifies the system description (being a helpful assistant),
        - `{"role": "user", "content": "Hello!"}` which specifies the user input "Hello!"

The returned chat completion object (https://platform.openai.com/docs/api-reference/chat/object), includes one possible responses (`choices[0]`) whose message content is "Hello! How can I assist today?"

You can also have a look at: https://cookbook.openai.com/examples/how_to_format_inputs_to_chatgpt_models

In [4]:
from openai import OpenAI
client = OpenAI(api_key=OPENAI_API_KEY)

completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)

print(completion.choices[0].message.content)

Hello! How can I assist you today?


In this assignment, we will use this chat completion function to prompt gpt-4o-mini for a few tasks. For the ease of the work, let's define the following wrapper function called "ChatCompletion" on top of OpenAI's chat completion.

Note that in the function, we have included two additional arguments to the API call:
- `n_samples` is passed as the argument `n` to `client.chat.completions.create`, which specifies the number of samples requested from the LLM;
- `top_p` is passed as the argument `top_p` to `client.chat.completions.create`, which specifies the p% probability mass to sample from.

In [5]:
def ChatCompletion(prompt, n_samples=1, top_p=1.0, return_object=False):
    assert n_samples >= 1
    assert top_p <= 1 and top_p > 0
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        n=n_samples,
        top_p=top_p
    )

    if n_samples == 1:
        print("Response: ", completion.choices[0].message.content)
    else:
        print("The call returns %d responses:\n" % n_samples)
        for i in range(n_samples):
            print("*Response %d*: " % i, completion.choices[i].message.content)
            print("-" * 10)

    if return_object:
        return completion

### Task 1: Story Generation with Different Sampling Strategies

In the first question, we will learn about different generation effects with a sampling approach called "nucleus sampling". We will try its difference configurations with different `top_p`.


### Question 1 (5 points)

Can you use the ChatCompletion function to generate a story about an Indian student studying abroad (e.g., at George Mason University)? Please use the default setting and generate only one story.

In [6]:
prompt = "Please write a story about an Indian student studying abroad at George Mason University."

ChatCompletion(prompt)


Response:  **Title: A Journey of Dreams**

Arjun Mehta stood in front of the sprawling campus of George Mason University, the crisp autumn air swirling around him. He had arrived a week ago from his hometown of Pune, India, and everything still felt like a dream. The vibrant colors of the leaves contrasted with the red brick buildings, and he felt both exhilarated and apprehensive. The journey that had brought him here was both long and arduous, yet filled with hope—a desire to pursue his studies in computer science, a field he had been passionate about since childhood.

Back in Pune, Arjun had spent countless hours coding, diligently practicing for the entrance exams, and dreaming of studying abroad. When he received his acceptance letter from George Mason, it felt surreal. It was a testament to his hard work, and he was determined to make the most of it.

On his first day of classes, Arjun walked into the lecture hall, filled with students from diverse backgrounds. As he took a seat,

### Question 2 (5 points)

Now, can you do the same but try to get 2 generations with `top_p` set to be 1?

In [7]:
prompt = "Please write a story about an Indian student studying abroad at George Mason University."

ChatCompletion(prompt, n_samples=2, top_p=1)

The call returns 2 responses:

*Response 0*:  **Title: A Journey Beyond Borders**

Aarav Sharma stood at the bustling Dulles International Airport, feeling a thrilling mix of excitement and nerves coursing through him. Arriving from India, he was on the cusp of a new chapter in his life as he set foot in the United States to pursue his Master’s degree in Environmental Science at George Mason University. 

His journey hadn't been easy; it had been filled with late-night study sessions, countless applications, and the bittersweet farewells of friends and family back home. Aarav could still hear his mother’s voice, reminding him to call often, and his father’s encouraging words about how this was a tremendous opportunity to explore the world and all that it had to offer.

As Aarav stepped out into the crisp air of Virginia, a wave of reality washed over him. The sprawling campus of George Mason, with its modern buildings and lush green spaces, felt both intimidating and inviting. The echo

### Question 3 (5 points)

How about 2 generations with `top_p` set to be 0.5?

In [8]:
prompt = "Please write a story about an Indian student studying abroad at George Mason University."

ChatCompletion(prompt, n_samples=2, top_p=0.5)

The call returns 2 responses:

*Response 0*:  **Title: A Journey Beyond Borders**

Riya Sharma stood at the entrance of George Mason University, her heart racing with a mix of excitement and nervousness. Having traveled over 8,000 miles from her hometown in Pune, India, to Fairfax, Virginia, she felt like a tiny fish in a vast ocean. The sprawling campus buzzed with students from all over the world, each one seemingly confident and at ease in their surroundings. Riya took a deep breath, adjusted her backpack, and stepped forward into her new life.

The first few weeks were a whirlwind of orientation sessions, new classes, and meeting people from diverse backgrounds. Riya was pursuing a Master’s in Information Technology, a field she had always been passionate about. Back in India, she had excelled in her undergraduate studies, but the academic environment at George Mason was unlike anything she had experienced before. The professors encouraged open discussions, and students were expect

### Question 4 (10 points)

What did you observe from Q1 - Q3? Did the different `top_p` configurations give you the same or different results? Why?

### Task 2: gpt-4o-mini for Solving Mathematical Problems

The second task we will try is about solving a math problem.

The math problem we consider is:

> Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?

For your reference, the correct answer should be 18, following the reasoning chain below:

> First multiply the five remaining vacuum cleaners by two to find out how many Melanie had before she visited the orange house: 5 * 2 = 10;
> Then add two to figure out how many vacuum cleaners she had before visiting the red house: 10 + 2 = 12;
> Now we know that 2/3 * x = 12, where x is the number of vacuum cleaners Melanie started with. We can find x by dividing each side of the equation by 2/3, which produces x = 18


### Question 5 (5 points)
Can you use the ChatCompletion function and prompt gpt-4o-mini to work out the problem?

In [9]:
math_problem = 'Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?'

ChatCompletion(math_problem)


Response:  Let \( x \) be the number of vacuum cleaners Melanie started with.

1. She sold a third of her vacuum cleaners at the green house:
   \[
   \text{Sold at green house} = \frac{x}{3}
   \]
   After this sale, she has:
   \[
   x - \frac{x}{3} = \frac{2x}{3}
   \]

2. Next, she sold 2 more to the red house:
   \[
   \text{Sold at red house} = 2
   \]
   Now, she has:
   \[
   \frac{2x}{3} - 2
   \]

3. Then, she sold half of what was left at the orange house. The amount left before this sale is:
   \[
   \frac{2x}{3} - 2
   \]
   Therefore, she sold half of this amount at the orange house:
   \[
   \text{Sold at orange house} = \frac{1}{2} \left( \frac{2x}{3} - 2 \right)
   \]

   After selling this, the amount of vacuum cleaners she has left is:
   \[
   \left( \frac{2x}{3} - 2 \right) - \frac{1}{2} \left( \frac{2x}{3} - 2 \right)
   \]

   We can factor it out:
   \[
   \frac{2x}{3} - 2 - \frac{1}{2} \left( \frac{2x}{3} - 2 \right) = \frac{2x}{3} - 2 - \frac{1}{2} \cdot \frac

Did gpt-4o-mini solve the problem correctly? If not, where did it go wrong?

### Question 6 (10 points)

Now, try to get 10 solutions from gpt-4o-mini with `top_p` set to 0.7.

In [10]:
math_problem = 'Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?'

ChatCompletion(math_problem, n_samples=10, top_p=0.7)


The call returns 10 responses:

*Response 0*:  Let \( x \) be the number of vacuum cleaners Melanie started with.

1. She sold a third of her vacuum cleaners at the green house:
   \[
   \text{Sold at green house} = \frac{x}{3}
   \]
   After this sale, she has:
   \[
   x - \frac{x}{3} = \frac{2x}{3}
   \]

2. She sold 2 more at the red house:
   \[
   \text{Sold at red house} = 2
   \]
   After this sale, she has:
   \[
   \frac{2x}{3} - 2
   \]

3. She sold half of what was left at the orange house. The amount left after the red house is:
   \[
   \frac{2x}{3} - 2
   \]
   Half of this amount is:
   \[
   \text{Sold at orange house} = \frac{1}{2} \left( \frac{2x}{3} - 2 \right) = \frac{2x}{6} - 1 = \frac{x}{3} - 1
   \]
   After this sale, she has:
   \[
   \left( \frac{2x}{3} - 2 \right) - \left( \frac{x}{3} - 1 \right)
   \]

4. Now, let's simplify the expression for the amount left:
   \[
   \frac{2x}{3} - 2 - \left( \frac{x}{3} - 1 \right) = \frac{2x}{3} - 2 - \frac{x}{3} + 1
  

You may see multiple different answers produced by gpt-4o-mini. Summarize the

---

answers in the table on the report. Did gpt-4o-mini do right in all of the solutions? If there are any mistakes, what are the common errors that gpt-4o-mini make?

### Question 7 (10 points)

Can you try other ways to prompt gpt-4o-mini to give correct solutions more stably? Be creative!

It may be helpful to design your prompt considering multiple math problems together. Hence we provided another one below:

The problem is:
> John drives for 3 hours at a speed of 60 mph and then turns around because he realizes he forgot something very important at home.  He tries to get home in 4 hours but spends the first 2 hours in standstill traffic.  He spends the next half-hour driving at a speed of 30mph, before being able to drive the remaining time of the 4 hours going at 80 mph.  How far is he from home at the end of those 4 hours?

For your reference, the correct answer is 45:
> When he turned around he was 3*60=180 miles from home
> He was only able to drive 4-2=2 hours in the first four hours.
> In half an hour he goes 30*.5=15 miles. He then drives another 2-.5=1.5 hours. In that time he goes 80*1.5=120 miles. So he drove 120+15=135 miles
> So he is 180-135=45 miles away from home

Include your prompt design and the answer on the report. Why do you think it works or not?

In [2]:
OPENAI_API_KEY= ''
!pip install openai

from openai import OpenAI
client = OpenAI(api_key=OPENAI_API_KEY)

completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)

print(completion.choices[0].message.content)


# Define the math problems and solutions
problem1 = 'Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?'

solution1 = '''Let \( x \) be the number of vacuum cleaners Melanie started with.

1. **Sold at the green house:**
   \[
   \text{Sold} = \frac{x}{3}
   \]
   \[
   \text{Remaining} = x - \frac{x}{3} = \frac{2x}{3}
   \]

2. **Sold at the red house:**
   \[
   \text{Sold} = 2
   \]
   \[
   \text{Remaining} = \frac{2x}{3} - 2
   \]

3. **Sold at the orange house:**
   \[
   \text{Sold} = \frac{1}{2} \left( \frac{2x}{3} - 2 \right)
   \]
   \[
   \text{Remaining} = \left( \frac{2x}{3} - 2 \right) - \frac{1}{2} \left( \frac{2x}{3} - 2 \right) = \frac{\left( \frac{2x}{3} - 2 \right)}{2}
   \]

4. **Set up the equation:**
   \[
   \frac{\left( \frac{2x}{3} - 2 \right)}{2} = 5
   \]

5. **Solve for \( x \):**
   \[
   \frac{2x}{3} - 2 = 10
   \]
   \[
   \frac{2x}{3} = 12
   \]
   \[
   2x = 36
   \]
   \[
   x = 18
   \]

**Answer:** Melanie started with **18** vacuum cleaners.
'''

problem2 = 'A farmer has chickens and cows. If there are a total of 30 heads and 100 legs, how many chickens and cows are there?'

solution2 = '''Let \( c \) be the number of chickens and \( k \) be the number of cows.

1. **Set up the equations:**
   - Total heads:
     \[
     c + k = 30 \quad (1)
     \]
   - Total legs:
     \[
     2c + 4k = 100 \quad (2)
     \]

2. **Solve equation (1) for \( c \):**
   \[
   c = 30 - k
   \]

3. **Substitute \( c \) into equation (2):**
   \[
   2(30 - k) + 4k = 100
   \]
   \[
   60 - 2k + 4k = 100
   \]
   \[
   60 + 2k = 100
   \]

4. **Solve for \( k \):**
   \[
   2k = 40
   \]
   \[
   k = 20
   \]

5. **Find \( c \):**
   \[
   c = 30 - k = 30 - 20 = 10
   \]

**Answer:** There are **10** chickens and **20** cows.
'''

problem3 = 'John drives for 3 hours at a speed of 60 mph and then turns around because he realizes he forgot something very important at home. He tries to get home in 4 hours but spends the first 2 hours in standstill traffic. He spends the next half-hour driving at a speed of 30 mph, before being able to drive the remaining time of the 4 hours going at 80 mph. How far is he from home at the end of those 4 hours?'

# Set up the messages for the ChatCompletion function
messages = [
    {"role": "system", "content": "You are an expert mathematician. Solve the following problems step by step, and provide the final answer."},
    {"role": "user", "content": f"Problem 1:\n{problem1}"},
    {"role": "assistant", "content": f"Solution 1:\n{solution1}"},
    {"role": "user", "content": f"Problem 2:\n{problem2}"},
    {"role": "assistant", "content": f"Solution 2:\n{solution2}"},
    {"role": "user", "content": f"Problem 3:\n{problem3}"},
]

# Call the ChatCompletion function
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)

# Print the assistant's response
print(completion.choices[0].message.content)


Collecting openai
  Downloading openai-1.51.0-py3-none-any.whl.metadata (24 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.51.0-py3-none-any.whl (383 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m383.5/383.5 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpcore-1.0.6-py3-none-any.whl (78 kB)
[2K   [90m━━

#### Acknowledgement: The math problems used in this notebook come from the GSM8k dataset: Training Verifiers to Solve Math Word Problems, Cobbe et al., 2021. https://huggingface.co/datasets/gsm8k