# HW 1: System Prompt Engineering with Phoenix 

## 🎯 Assignment Overview

Welcome to Homework Assignment 1! In this assignment, we'll be building the foundation for our recipe chatbot by focusing on prompt engineering and systematic testing. Our goal is to get a working prompt for us to use in Homework 2. 

### What We'll Accomplish:

1. **🤖 Write an Effective System Prompt**: Create a well-crafted system prompt that defines our recipe bot's personality, capabilities, and output format
2. **📊 Expand Our Query Dataset**: Add diverse test queries to evaluate different aspects of our chatbot's performance  
3. **🔬 Set up Phoenix Testing**: Use Phoenix to test and improve our prompt.

Let's dive in! 🚀


In [2]:
# Install required packages
import subprocess
import sys


def install_packages():
    packages = [
        "arize-phoenix[evals]",
        "openai",
        "pandas",
        "openinference-instrumentation-openai",
        "nest-asyncio",
    ]

    for package in packages:
        print(f"Installing {package}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])


# Uncomment to install packages
# install_packages()

In [3]:
import nest_asyncio
import pandas as pd

import phoenix as px
from phoenix.client import Client
from phoenix.client.types import PromptVersion

# Enable nested async for Jupyter
nest_asyncio.apply()

print("✅ All imports successful!")

  from .autonotebook import tqdm as notebook_tqdm


✅ All imports successful!


## 1. Environment Setup 

Before we start prompt engineering, we need to set up Phoenix on our local machine. You can either do this in the notebook with the below code or from the terminal using `phoenix serve` or `python -m phoenix.server.main serve`



In [None]:
# Launch Phoenix locally in the notebook
# session = px.launch_app(use_temp_dir=False)
# print(f"🔥 Phoenix is running at: {session.url}")
# print("📱 Click the link above to open the Phoenix UI in your browser")



## 2. Prompt Management - Create and Store Prompts

Now let's create a prompt for our recipe chat bot and store them in Phoenix's prompt hub. We'll start with a super basic prompt that we can iterate on later. 


In [5]:
# Create recipe assistant prompts
recipe_prompt_v1 = """
You are a recipe assistant. Your job is to generate easy to follow recipes and cooking advice. You should always provide ingredient lists with precise measurements using standard units. You should always include clear, step-by-step instructions.

User Query: {query}

"""

prompt_name = "recipe-assistant-v1"
prompt = Client().prompts.create(
    name=prompt_name,
    prompt_description="Basic recipe assistant prompt",
    version=PromptVersion(
        [{"role": "system", "content": recipe_prompt_v1}],
        model_name="gpt-4o-mini",
    ),
)


print("\n🎯 Prompt created! You can now view it in the Phoenix UI under the 'Prompts' section.")


🎯 Prompt created! You can now view it in the Phoenix UI under the 'Prompts' section.


## 3. Create Dataset from CSV

Let's create a dataset with some recipe queries and then add 10 new ones.


In [7]:
df = pd.read_csv("data/sample_queries.csv")
df.head()

Unnamed: 0,id,query
0,1,Suggest a quick vegan breakfast recipe
1,2,I have chicken and rice. what can I cook?
2,3,Give me a dessert recipe with chocolate


### Expanding Our Test Dataset

Now let's add more diverse queries to test different aspects of our recipe chatbot. According to the homework assignment, we need at least 10 new queries that cover:

- **Specific cuisines** (Italian, Thai, etc.)
- **Dietary restrictions** (vegan, gluten-free)  
- **Available ingredients** ("What can I make with X?")
- **Meal types** (breakfast, lunch, dinner, snacks)
- **Time constraints** ("under 30 minutes")
- **Skill levels** (beginner-friendly)
- **Vague queries** (to test how the bot handles ambiguity)

> **Note**: You can also add queries directly in the Phoenix UI, but we'll do it programmatically here for reproducibility.

In [8]:
# Add 10 new queries to the dataset
new_queries = [
    "How do I make sushi rice?",
    "What's the secret to fluffy pancakes?",
    "How do I properly season a cast iron pan?",
    "What's the best way to cook salmon?",
    "How do I make fresh pasta sauce?",
    "What spices are essential for Indian cooking?",
    "How do I make sourdough starter?",
    "What's the difference between baking and broiling?",
    "How do I make homemade stock?",
    "What's the best way to store fresh herbs?",
]

new_df = pd.DataFrame(
    {
        "query": new_queries,
    }
)

# Append to existing dataset
full_df = pd.concat([df, new_df], ignore_index=True)

client = px.Client()
dataset = client.upload_dataset(
    dataframe=full_df,
    dataset_name="recipe-queries",
    input_keys=["query"],
)

📤 Uploading dataset...
💾 Examples uploaded: http://127.0.0.1:6006/datasets/RGF0YXNldDoy/examples
🗄️ Dataset version ID: RGF0YXNldFZlcnNpb246Mg==


## Testing Our Prompt in Phoenix

Now that we have our prompt and dataset uploaded to Phoenix, it's time to test and iterate!

### 🧪 Testing in the Phoenix UI:

1. **Navigate to Phoenix**
2. **Find Your Prompt**: Go to the "Prompts" section and find your `recipe-assistant-v1` prompt  
3. **Create an Experiment**: Use your `recipe-queries` dataset to test the prompt
4. **Evaluate Results**: Look for:
   - Proper markdown formatting
   - Clear ingredient lists and instructions
   - Appropriate creativity vs. accuracy balance
   - Handling of edge cases (vague queries, dietary restrictions, etc.)

### 🔍 What to Look For:
- **Formatting**: Are recipes properly structured with `##` headers and `###` sections?
- **Completeness**: Do responses include ingredients, instructions, and helpful tips?
- **Safety**: Are unsafe requests properly declined?
- **Creativity**: Does the bot provide helpful variations and substitutions?
- **Consistency**: Are measurements in standard units and instructions clear?

### ✨ Iteration Tips:
- Test edge cases: very vague queries, dietary restrictions, cooking methods
- Try different phrasings of the same request
- Look for patterns in failures and adjust your prompt accordingly
- Use Phoenix's evaluation tools to systematically measure performance

After testing several queries and feeling confident about your prompt's performance, move on to applying it to the local codebase!

## 4. Assignment Complete! 

Excellent work! We've successfully completed Homework 1 by building and testing our recipe chatbot prompt entirely within the Phoenix platform.

### 🎯 What We Accomplished:
- ✅ Created a comprehensive system prompt with clear formatting guidelines
- ✅ Set up Phoenix for testing and prompt management  
- ✅ Built and uploaded a diverse test dataset with 13+ queries
- ✅ Tested our prompt systematically in the Phoenix UI
- ✅ Iterated on prompt performance using Phoenix's evaluation tools
- ✅ Established MLOps best practices for prompt development



### 🚀 Ready for Homework 2!
With our solid foundation of prompt engineering and systematic evaluation, we're now prepared to tackle Homework 2. 


