# Setup

In [3]:
from openai import OpenAI
from tqdm.auto import tqdm

openai_client = OpenAI()

def generate(prompt):
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "system", "content": "You are a helpful and unbiased assistant."},
                  {"role": "user", "content": prompt}],
        temperature=0
    )
    return response.choices[0].message.content.strip()

# Part 1: Fundamental Prompting Techniques

In this section, we'll explore the core prompting strategies that form the foundation of working with LLMs. We'll use a consistent task across all examples so you can clearly see how each technique affects the model's behavior.

## 1. Zero-shot Prompting

**What is it?** Zero-shot prompting means asking the model to perform a task without any examples. You're relying entirely on the model's pre-trained knowledge.

**When to use it:**
- Simple, well-defined tasks
- When you want quick results without setup

**Watch what happens:**

In [4]:
# Zero-shot: No examples given
prompt = "Create an acronym from 'artificial neural network' using the last letter of each word."
zero_shot_output = generate(prompt)

print("Prompt:", prompt)
print("\nOutput:", zero_shot_output)

Prompt: Create an acronym from 'artificial neural network' using the last letter of each word.

Output: The last letters of the words "artificial neural network" are "l," "a," and "k." An acronym using these letters could be "LAK."


## 2. One-shot Prompting

**What is it?** One-shot prompting provides an example before asking the model to perform the task. This helps the model understand the pattern you want.

**When to use it:**
- Tasks with specific formats
- When zero-shot doesn't work well
- Teaching a simple pattern quickly

**Watch what happens::**

In [5]:
# One-shot: One example provided
prompt = """Create an acronym using the last letter of each word:

"big red car" -> GDR
"artificial neural network" ->"""

one_shot_output = generate(prompt)

print("Prompt:", prompt)
print("\nOutput:", one_shot_output)

Prompt: Create an acronym using the last letter of each word:

"big red car" -> GDR
"artificial neural network" ->

Output: The acronym for "artificial neural network" using the last letter of each word is: LAL.


## 3. Few-shot Prompting

**What is it?** Few-shot prompting provides multiple examples (typically 2-5) to establish a clear pattern. The model learns from these examples and applies the pattern to new inputs.

**When to use it:**
- Complex patterns that need reinforcement
- Tasks requiring specific formatting
- When consistency is critical

**Multiple examples = stronger pattern:**

In [6]:
# Few-shot: Multiple examples provided
prompt = """Create an acronym using the last letter of each word:

"big red car" -> GDR
"machine learning system" -> EGM
"deep learning model" -> PGL
"artificial neural network" ->"""

few_shot_output = generate(prompt)

print("Prompt:", prompt)
print("\nOutput:", few_shot_output)

Prompt: Create an acronym using the last letter of each word:

"big red car" -> GDR
"machine learning system" -> EGM
"deep learning model" -> PGL
"artificial neural network" ->

Output: The acronym for "artificial neural network" using the last letter of each word is: KAL.


## 4. Zero-shot Chain-of-Thought (CoT)

**What is it?** By adding "Think step-by-step" or similar phrases, we trigger the model to show its reasoning process. This often leads to more accurate results, especially for tasks requiring logic or calculation.

**When to use it:**
- Tasks requiring the model to reasoning correctly
- When you need to verify the model's thinking

**Magic phrase: "Think step-by-step"**

In [7]:
# Zero-shot CoT: No examples, but asking for reasoning
prompt = "Create an acronym from 'artificial neural network' using the last letter of each word. Think step-by-step."

zero_shot_cot_output_1 = generate(prompt)

print("Prompt:", prompt)
print("\nOutput:", zero_shot_cot_output_1)

Prompt: Create an acronym from 'artificial neural network' using the last letter of each word. Think step-by-step.

Output: To create an acronym from the phrase "artificial neural network" using the last letter of each word, we can follow these steps:

1. Identify the last letter of each word:
   - "artificial" → last letter is **l**
   - "neural" → last letter is **l**
   - "network" → last letter is **k**

2. Combine the last letters:
   - From "artificial" → **l**
   - From "neural" → **l**
   - From "network" → **k**

3. Form the acronym:
   - The acronym formed from the last letters is **LLK**.

So, the acronym from "artificial neural network" using the last letter of each word is **LLK**.


In [8]:
# Another example
prompt = "Create an acronym from 'artificial neural network' using the last letter of each word. Before answering, first explain your work!"

zero_shot_cot_output_2 = generate(prompt)

print("Prompt:", prompt)
print("\nOutput:", zero_shot_cot_output_2)

Prompt: Create an acronym from 'artificial neural network' using the last letter of each word. Before answering, first explain your work!

Output: To create an acronym from the phrase "artificial neural network," we will take the last letter of each word in the phrase. 

1. The last letter of "artificial" is **l**.
2. The last letter of "neural" is **l**.
3. The last letter of "network" is **k**.

Now, we will combine these letters to form the acronym. 

So, the acronym formed from the last letters of "artificial neural network" is **LLK**.


## 5. Few-Shot Chain-of-Thought (CoT)

**What is it?** A powerful combination - providing examples that demonstrate step-by-step reasoning. The model learns both the pattern AND the reasoning process.

**When to use it:**
- Complex tasks requiring specific reasoning
- Teaching the model a particular problem-solving approach
- Maximum accuracy is needed

**Examples + Reasoning = Very good results:**

In [9]:
# Few-shot CoT: Examples with step-by-step reasoning
prompt = """Create an acronym using the last letter of each word. Think step-by-step.

Here are some examples:

Example 1: 
Phrase: "big red car"
- Words: big, red, car
- Last letters: g, d, r
- Result: GDR

Example 2: 
Phrase: "machine learning system"
- Words: machine, learning, system
- Last letters: e, g, m
- Result: EGM

Now solve: 
Phrase: "artificial neural network"
"""

few_shot_cot_output = generate(prompt)

print("Prompt:", prompt)
print("\nOutput:", few_shot_cot_output)

Prompt: Create an acronym using the last letter of each word. Think step-by-step.

Here are some examples:

Example 1: 
Phrase: "big red car"
- Words: big, red, car
- Last letters: g, d, r
- Result: GDR

Example 2: 
Phrase: "machine learning system"
- Words: machine, learning, system
- Last letters: e, g, m
- Result: EGM

Now solve: 
Phrase: "artificial neural network"


Output: Let's break down the phrase "artificial neural network" step-by-step:

1. Identify the words: 
   - artificial
   - neural
   - network

2. Find the last letter of each word:
   - artificial → l
   - neural → l
   - network → k

3. Combine the last letters:
   - Last letters: l, l, k

4. Resulting acronym:
   - LLK

So, the acronym for "artificial neural network" is **LLK**.


## Comparison:

Let's compare all techniques side by side to see how each approach affects the output:

In [10]:
# Compare all techniques
print("🔍 COMPARISON:\n")
print("=" * 50)

techniques = [
    ("Zero-shot", zero_shot_output),
    ("One-shot", one_shot_output),
    ("Few-shot", few_shot_output),
    ("Zero-shot CoT (example 1)", zero_shot_cot_output_1.split('\n')[-1] if '\n' in zero_shot_cot_output_1 else zero_shot_cot_output_1),
    ("Zero-shot CoT (example 2)", zero_shot_cot_output_2.split('\n')[-1] if '\n' in zero_shot_cot_output_2 else zero_shot_cot_output_2),
    ("Few-shot CoT", few_shot_cot_output.split('\n')[-1] if '\n' in few_shot_cot_output else few_shot_cot_output)
]

for name, output in techniques:
    # Extract just the answer for cleaner comparison
    answer = output.split("**")[-1] if "**" in output else output
    print(f"{name:15} → {output}")

🔍 COMPARISON:

Zero-shot       → The last letters of the words "artificial neural network" are "l," "a," and "k." An acronym using these letters could be "LAK."
One-shot        → The acronym for "artificial neural network" using the last letter of each word is: LAL.
Few-shot        → The acronym for "artificial neural network" using the last letter of each word is: KAL.
Zero-shot CoT (example 1) → So, the acronym from "artificial neural network" using the last letter of each word is **LLK**.
Zero-shot CoT (example 2) → So, the acronym formed from the last letters of "artificial neural network" is **LLK**.
Few-shot CoT    → So, the acronym for "artificial neural network" is **LLK**.


# Part 2: Beyond Text Generation

So far, we've seen that LLMs are fundamentally:
**Text Input → LLM → Text Output**

But what if we could do more? Let me introduce a series of "what if" questions. Each question will unlock a new capability, gradually revealing how simple text generators can evolve into more complex things.

### 🤔 What if we want to change the LLM's behavior or personality?

Right now, every time we use an LLM, it responds with its default personality and style. But what if we could give it a specific role or personality? What if we could make it behave like a Python expert, or a data scientist, or even a pirate?

**Enter: System Prompts!**

#### Let's take a look at a few examples of using system prompts to change behavior:

In [24]:
# Helper function with system prompt
def generate_with_system(system_prompt, user_prompt, temperature=0):
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=temperature
    )
    return response.choices[0].message.content.strip()

In [25]:
# Compare different roles
roles = {
    "Pirate": "You are a pirate. You always respond in pirate speak.",
    "Shakespeare": "You are Shakespeare. Respond in Elizabethan English with poetic flair.",
    "Data Scientist": "You are a data scientist. Use technical terms and be precise."
}

user_prompt = "Explain what Large Language Models are in less than 50 words."

print("Same question, different roles:\n")
for role_name, system_prompt in roles.items():
    output = generate_with_system(system_prompt, user_prompt)
    print(f"🎭 {role_name}:")
    print(output + "\n") 

Same question, different roles:

🎭 Pirate:
Arrr, matey! Large Language Models be mighty algorithms that learn from vast seas o' text, usin' patterns to generate human-like speech. They be helpin' with tasks like writin', translatin', and answerin' questions, all while sailin' the treacherous waters of language! Yarrr!

🎭 Shakespeare:
In realms of code and thought entwined,  
Large Language Models, vast and refined,  
With words they weave, like silken thread,  
From countless tomes, their wisdom spread.  
They mimic speech, both wise and fair,  
A mirror of our minds laid bare.

🎭 Data Scientist:
Large Language Models (LLMs) are advanced neural networks, typically based on transformer architecture, trained on vast text corpora to understand and generate human-like language. They leverage unsupervised learning and fine-tuning to perform various natural language processing tasks, including text generation, translation, and sentiment analysis.



**💡 Insight:** System prompts are like giving the LLM a job description! They define how it should behave, what expertise it should have, and what constraints it should follow. This is the first step toward making LLMs useful for specific tasks.

### 🤔 LLM outputs change from one run to another... What if we want something more deterministic? 
### What if we want the output to be of a specific format? Or follow certain rules (eg. "month" should contain only one of the following words: "Jan, Feb, Mar, Apr, May, Jun, July, ...... Dec")?

You might have noticed that LLMs sometimes give different answers to the same question, more often when the temperature is high. This randomness can be a problem when we need consistent, reliable outputs. What if we could control not just WHAT the LLM says, but HOW it structures its response?

#### Let's see if we can prompt our way to a fixed format:

In [26]:
import json

# Example 1: JSON output
json_prompt = """Extract information from this text and return ONLY valid JSON:

"Alice and Bob are meeting for lunch tomorrow at 2 PM at the Italian restaurant downtown."

Format:
{
  "participants": [...],
  "time": "...",
  "location": "..."
}
"""

output = generate(json_prompt)
print("JSON Output:")
print(output)

# Parse it to verify it's valid JSON
try:
    data = json.loads(output)
    print("\n✅ Valid JSON! Parsed data:", data)
except json.JSONDecodeError as e:
    print("\n❌ Invalid JSON", e)

JSON Output:
{
  "participants": ["Alice", "Bob"],
  "time": "2 PM",
  "location": "Italian restaurant downtown"
}

✅ Valid JSON! Parsed data: {'participants': ['Alice', 'Bob'], 'time': '2 PM', 'location': 'Italian restaurant downtown'}


OK, that worked! But this can be brittle. There is a better way.

### Recommended method for generating structured output
While prompting to output specific format may give the desired output a lot of the time, most LLMs these days have the "Structured Output" feature. It allows us to provide an output format via JSON Schema or Pydantic (a Python data validation library that uses type annotations to define schemas). These methods guarantee that the output will conform to your specified structure, eliminating parsing errors and invalid formats.

Eg: https://platform.openai.com/docs/guides/structured-outputs

In [27]:
# Example 2: Structured output with Pydantic (OpenAI's structured outputs)
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed

print("Structured Output with Pydantic:")
print(f"Event Name: {event.name}")
print(f"Event Date: {event.date}")
print(f"Participants: {event.participants}")
print("\n🎯 Guaranteed valid structure!")

Structured Output with Pydantic:
Event Name: Science Fair
Event Date: Friday
Participants: ['Alice', 'Bob']

🎯 Guaranteed valid structure!


**💡 Insight:** We can force LLMs to output in specific formats like JSON or use Pydantic schemas to guarantee structure. This makes their outputs predictable and parseable by code!

### 🤔 Since LLMs can output text, can we actually output code?

We've seen LLMs generate formatted text and JSON. But code is just text too, right? What if we asked an LLM to write actual, executable code?


#### Let's find out!

In [28]:
# Ask LLM to generate pandas code
prompt = """Generate Python code using pandas to:
1. Create a DataFrame with sales data (product, price, quantity)
2. Calculate total revenue (price * quantity)
3. Find the product with highest revenue

Output ONLY executable Python code."""

code_output = generate(prompt)
print("Generated Code:")
print("=" * 50)
print(code_output)

Generated Code:
```python
import pandas as pd

# Step 1: Create a DataFrame with sales data
data = {
    'product': ['A', 'B', 'C', 'D'],
    'price': [10, 20, 15, 30],
    'quantity': [100, 150, 200, 50]
}
df = pd.DataFrame(data)

# Step 2: Calculate total revenue
df['total_revenue'] = df['price'] * df['quantity']

# Step 3: Find the product with highest revenue
highest_revenue_product = df.loc[df['total_revenue'].idxmax()]

print(highest_revenue_product)
```


### 🤔 If we can output code, what if we actually RUN that code?

The LLM just generated code as text. But it's valid Python code... What if we could execute it programmatically? What if our program could take the LLM's code and actually run it?

#### Let's try!

In [29]:
# Extract just the code (remove markdown formatting if any)
if "```python" in code_output:
    actual_code = code_output.split("```python")[1].split("```")[0]
else:
    actual_code = code_output

print("Executing the LLM-generated code...")
print("=" * 50)

# Use exec() to run the code
import pandas as pd
exec(actual_code)

print("\n🎉 The LLM's code ran successfully!")

Executing the LLM-generated code...
product             B
price              20
quantity          150
total_revenue    3000
Name: 1, dtype: object

🎉 The LLM's code ran successfully!


### 🤔 We have just run the code that the LLM produced, what if we feed the output back to the LLM and allow the LLM to see the results of its code?

So we can generate code and run it. But what happens after? The code produces output... What if we could capture that output and feed it back to the LLM? What if the LLM could see the results of its own code?

### Let's try that out:

In [30]:
import io
import sys

def capture_output(code):
    """Execute code and capture its output"""
    # Redirect stdout to capture print statements
    old_stdout = sys.stdout
    sys.stdout = captured_output = io.StringIO()
    
    try:
        exec(code, {'pd': pd})
        output = captured_output.getvalue()
    finally:
        sys.stdout = old_stdout
    
    return output

# Step 1: Generate code
prompt1 = "Generate pandas code to create a DataFrame with 5 random numbers and calculate their mean. Print the DataFrame and the mean."
code1 = generate(prompt1)

# Clean the code
if "```python" in code1:
    code1 = code1.split("```python")[1].split("```")[0]

print("Generated Code:")
print(code1)
print("\n" + "=" * 50)

# Step 2: Run and capture output
output = capture_output(code1)
print("Captured Output:")
print(output)
print("=" * 50)

# Step 3: Feed output back to LLM
prompt2 = f"""Based on this output from a pandas operation:

{output}

What is the mean value? Is it above or below 0.5?"""

analysis = generate(prompt2)
print("\nLLM Analysis of its own output:")
print(analysis)

print("\n🔄 We've closed the loop! The LLM can see and analyze the results of its own code!")

Generated Code:

import pandas as pd
import numpy as np

# Generate 5 random numbers
random_numbers = np.random.rand(5)

# Create a DataFrame
df = pd.DataFrame(random_numbers, columns=['Random Numbers'])

# Calculate the mean
mean_value = df['Random Numbers'].mean()

# Print the DataFrame and the mean
print("DataFrame:")
print(df)
print("\nMean of the random numbers:", mean_value)


Captured Output:
DataFrame:
   Random Numbers
0        0.930563
1        0.635088
2        0.296343
3        0.091544
4        0.496927

Mean of the random numbers: 0.4900930474608751


LLM Analysis of its own output:
The mean value of the random numbers is approximately 0.4901. Since this value is below 0.5, we can conclude that the mean is below 0.5.

🔄 We've closed the loop! The LLM can see and analyze the results of its own code!


### 🤔 The code from the LLM might be written poorly and run into errors. What if feed those back to the LLM and prompt it to fix it?

Sometimes the code the LLM generates has errors. But if we can capture those errors and feed them back... What if the LLM could debug its own code? What if it could learn from its mistakes and try again?


### Let's intentionally cause an error and see if the LLM can fix it:

In [37]:
def run_code_with_error_fixing(task, max_attempts=3):
    """Run LLM-generated code, automatically fixing errors"""
    
    grades = [85, 92, 78, 95, 88]
    students = ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']
    
    context = {'grades': grades, 'students': students}
    for attempt in range(max_attempts):
        if attempt == 0:
            # First attempt - tell LLM to use wrong variable names
            prompt = f"""Write Python code to: {task}
You have access to the following variables in memory:
- student_grades (list of numbers)
- student_ids (list of IDs)

Do not create these variables, they already exist!!
"""
        else:
            # Retry with actual variable names
            prompt = f"""Your previous code had an error:
Error: {error_msg}
Failed code:
{code}

Task: {task}

Available Context:
{context} 

Fix any errors in the code:"""
        
        # Generate code
        code = generate(prompt)
        if "```python" in code:
            code = code.split("```python")[1].split("```")[0]
        
        print(f"Attempt {attempt + 1}:")
        print(code)
        print()
        
        # Try to execute
        try:
            
            exec(code, {'__builtins__': __builtins__}, context)
            
            print("✅ Success!")
            return code
            
        except Exception as e:
            error_msg = str(e)
            print(f"❌ Error: {error_msg}")
            if attempt < max_attempts - 1:
                print("🔄 Asking LLM to fix it...\n")
    
    print("❌ Max attempts reached")
    return None

# Run the test
result = run_code_with_error_fixing(
    "Calculate the average grade and print each student with their index"
)

Attempt 1:

# Calculate the average grade
average_grade = sum(student_grades) / len(student_grades) if student_grades else 0

# Print the average grade
print(f"Average Grade: {average_grade:.2f}")

# Print each student with their index and ID
for index, student_id in enumerate(student_ids):
    print(f"Index: {index}, Student ID: {student_id}, Grade: {student_grades[index]}")


❌ Error: name 'student_grades' is not defined
🔄 Asking LLM to fix it...

Attempt 2:

# Available context
context = {'grades': [85, 92, 78, 95, 88], 'students': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']}

# Extract grades and student IDs from the context
student_grades = context['grades']
student_ids = context['students']

# Calculate the average grade
average_grade = sum(student_grades) / len(student_grades) if student_grades else 0

# Print the average grade
print(f"Average Grade: {average_grade:.2f}")

# Print each student with their index and ID
for index, student_id in enumerate(student_ids):
    print(f"I

## 🎯 Putting It All Together:

To review:

1. **System Prompts** → We can control the LLM's behavior and expertise
2. **Format Control** → We can make outputs predictable and parseable  
3. **Code Generation** → LLMs can write executable code
4. **Code Execution** → We can run that code programmatically
5. **Output Capture** → We can feed results back to the LLM
6. **Error Handling** → The LLM can debug and fix its own mistakes

When you combine all these capabilities, something magical happens...

### We've built something beyond simple text generation, that can:
- Understand requests (via prompts)
- Generate solutions (as code)
- Execute actions (run the code)
- Observe results (capture output)
- Learn from mistakes (error handling)
- Maintain context (remember previous operations)

### Let's see everyting in action:

In [73]:
import pandas as pd
import sys
import io

class App:
    """A simple app that can analyze data using pandas."""
    
    def __init__(self):
        """Initialize the app with sample sales data and system prompt."""
        
        # Define how the app should behave
        self.system_prompt = """You are a pandas expert. 
        Generate ONLY executable Python code.
        Do not create sample data. Assume that the variables required are available.
        Always print the results."""
        
        # Create sample sales data
        self.df = pd.DataFrame({
            'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'USB Cable'],
            'price': [1200, 25, 75, 300, 10],
            'quantity': [5, 50, 20, 10, 100],
            'category': ['Electronics', 'Accessories', 'Accessories', 
                        'Electronics', 'Accessories']
        })
        
        # Display the initialized data
        print("📊 App initialized with sales data:")
        print(self.df)
        print()
    
    def execute_request(self, user_request):
        """
        Process a user request by generating and executing pandas code.
        
        Args:
            user_request: Natural language request for data analysis
            
        Returns:
            The output from executing the generated code, or None if failed
        """

        context = {'my_df': self.df, 'pd': pd}
        print(f"👤 User: {user_request}")
        print(f"🤖 LLM generating code...")
        
        # Generate initial code from the request
        code = self._generate_code(user_request)
        
        # Try to execute the code (with one retry on failure)
        MAX_ATTEMPTS = 2
        
        for attempt in range(MAX_ATTEMPTS):
            try:
                # Execute the code and capture output
                output = self._execute_code(code, context)
                
                # Display successful results
                self._display_success(code, output)
                return output
                
            except Exception as e:
                # Handle execution errors
                if attempt == 0:
                    # First attempt failed - try to fix it
                    print(f"❌ Error: {e}")
                    print("🔄 Attempting to fix...")
                    
                    # Generate fixed code with error context
                    code = self._generate_fix(user_request, code, e, context)
                else:
                    # Second attempt also failed
                    print(f"❌ Could not complete request: {e}")
                    return None
    
    def _generate_code(self, prompt):
        """Extract Python code from LLM response."""
        code = generate_with_system(self.system_prompt, prompt)
        
        # Extract code from markdown code blocks if present
        if "```python" in code:
            code = code.split("```python")[1].split("```")[0]
        
        return code
    
    def _generate_fix(self, original_request, failed_code, error, context):
        """Generate fixed code based on the error."""
        error_prompt = f"""Previous code failed with: {error}
        
Task: {original_request}
Failed code: {failed_code}

Original Context: 
- Variables available in memory: {context.keys()}  <== You need to use these
- DataFrame columns: {list(self.df.columns)}
- Data shape: {self.df.shape}
- Sample data:
{self.df.head()}

Generate fixed code:"""
        
        return self._generate_code(error_prompt)
    
    def _execute_code(self, code, context):
        """Execute Python code and capture its output."""
        # Redirect stdout to capture print statements
        old_stdout = sys.stdout
        sys.stdout = captured_output = io.StringIO()
        
        try:
            # Execute code with DataFrame and pandas available
            exec(code, context)
            
            # Get the captured output
            output = captured_output.getvalue()
            return output
            
        finally:
            # Always restore stdout
            sys.stdout = old_stdout
    
    def _display_success(self, code, output):
        """Display successful code execution results."""
        print(f"💻 Generated code:")
        print("-" * 50)
        print(code)
        print("-" * 50)
        print(f"✅ Execution Result:")
        print(output)


# ============================================================
# DEMONSTRATION
# ============================================================

def run_demo():
    app = App()
    
    requests = [
        "What's the total revenue for each category?",
        "Which product has the highest revenue?",
        "Show me statistics about the prices"
    ]
    
    # Execute each request
    for i, request in enumerate(requests, 1):
        if i > 1:
            print("\n" + "=" * 60)
        else:
            print("=" * 60)
        
        app.execute_request(request)
    

run_demo()

📊 App initialized with sales data:
     product  price  quantity     category
0     Laptop   1200         5  Electronics
1      Mouse     25        50  Accessories
2   Keyboard     75        20  Accessories
3    Monitor    300        10  Electronics
4  USB Cable     10       100  Accessories

👤 User: What's the total revenue for each category?
🤖 LLM generating code...
❌ Error: name 'df' is not defined
🔄 Attempting to fix...
💻 Generated code:
--------------------------------------------------

df = my_df
df['revenue'] = df['price'] * df['quantity']
total_revenue_by_category = df.groupby('category')['revenue'].sum().reset_index()
print(total_revenue_by_category)

--------------------------------------------------
✅ Execution Result:
      category  revenue
0  Accessories     3750
1  Electronics     9000


👤 User: Which product has the highest revenue?
🤖 LLM generating code...
❌ Error: name 'df' is not defined
🔄 Attempting to fix...
💻 Generated code:
--------------------------------------

<hr>

## 💭 Reflection:

We started with a simple question: "What if we could do more than just generate text?"

Through a series of discoveries, we found that LLMs can enable not just simple text generation, but more complex applications that can:
- Take on specific roles and expertise
- Generate structured, parseable outputs
- Write and execute code
- Learn from its outputs
- Fix its own mistakes

This is the **paradigm shift** in programming we talked about last week:
- **Before**: We write every line of code, for errors, we use try....except, and write explicit handling logic for cases we expected
- **Now**: We leverage the semantic natural nanguage understanding capabilities of LLMs, to build highly dynamic applications

The app we just built is simple, but it demonstrates the core principle: **LLMs are not just text generators, they're reasoning engines that can be given tools and autonomy.**

Next week, we'll take this further and start building our full-featured data-analytics agent that can handle any data analysis task through natural language!

# Part 3: Practice Exercises

Try these exercises to reinforce what we covered today.

## Exercise 1: Math Problem Solving

Create a prompt that reliably solves word problems. Test with this problem:
"If a train travels 120 miles in 2 hours, and then 180 miles in 3 hours, what is its average speed for the entire journey?"


In [52]:
problem = "If a train travels 120 miles in 2 hours, and then 180 miles in 3 hours, what is its average speed for the entire journey?"

# Try different approaches:
# 1. Zero-shot
# 2. With "think step-by-step"
# 3. With few-shot examples

your_prompt = f"TODO: Your user prompt here"
solution = generate(your_prompt)
print(solution)

To find the average speed for the entire journey, we first need to calculate the total distance traveled and the total time taken.

1. **Calculate the total distance:**
   - The first part of the journey is 120 miles.
   - The second part of the journey is 180 miles.
   - Total distance = 120 miles + 180 miles = 300 miles.

2. **Calculate the total time:**
   - The time for the first part of the journey is 2 hours.
   - The time for the second part of the journey is 3 hours.
   - Total time = 2 hours + 3 hours = 5 hours.

3. **Calculate the average speed:**
   - Average speed = Total distance / Total time
   - Average speed = 300 miles / 5 hours = 60 miles per hour.

Therefore, the average speed for the entire journey is **60 miles per hour**.


## Exercise 2: Structured Data Extraction
Extract structured information from unstructured food delivery orders using three different approaches.

Your task:
1. **Text Prompting** - Write prompts to extract order details as formatted text
2. **JSON Prompting** - Modify prompts to return valid JSON
3. **Structured Output** - Define a Pydantic model and use the [LLM's structured output API](https://platform.openai.com/docs/guides/structured-outputs)

For each order, extract:
- Items ordered (with quantities)
- Special modifications/requests  
- Drinks
- Delivery/pickup preference

*Just focus on the sections marked as **TODO** for this exercise*

In [None]:
import json
from pydantic import BaseModel
from typing import List, Optional, Literal

# Sample food orders - all contain similar info in different formats
food_orders = [
    "2 pepperoni pizzas large, 1 greek salad, and 3 cokes for delivery to 123 Main St",
    "One burger medium with no onions, extra cheese, side of fries and a chocolate shake",
    "3x chicken tacos, 1x beef burrito no beans, 2 large horchatas, extra hot sauce please",
    "Pad Thai mild spice, Tom Yum soup, 2 spring rolls, and jasmine tea for pickup",
    "Large coffee with oat milk and 2 sugars, plus a blueberry muffin warmed up",
    "Family meal: 1 whole roast chicken, mashed potatoes, coleslaw, 4 dinner rolls"
]

# ============================================================
# LLM HELPER FUNCTION
# ============================================================

def call_llm(system_prompt, user_prompt, response_format=None):
    """
    Call the LLM with given prompts.
    
    Args:
        system_prompt: System message for the LLM
        user_prompt: User message for the LLM
        response_format: Optional Pydantic model for structured output
    
    Returns:
        String response or parsed object if response_format is provided
    """
    from openai import OpenAI
    client = OpenAI()
    
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
    
    if response_format:
        # Use structured output
        completion = client.chat.completions.parse(
            model="gpt-4o-mini",
            messages=messages,
            response_format=response_format
        )
        return completion.choices[0].message.parsed
    else:
        # Regular text output
        completion = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages
        )
        return completion.choices[0].message.content

# TODO
def extract_with_prompting(order):
    """
    Extract order information using text prompts.
    
    TODO: Write prompts to extract:
    - Items ordered (with quantities)
    - Special modifications
    - Drinks
    - Delivery/pickup preference
    """
    
    system_prompt = """TODO: Your system prompt here"""
    
    user_prompt = f"""TODO: Your user prompt here"""
    
    return call_llm(system_prompt, user_prompt)

# ============================================================
# APPROACH 2: JSON Prompting
# ============================================================

# TODO
def extract_with_json(order):
    """
    Extract order information as JSON.
    
    TODO: Write prompts that return JSON with:
    - items: list of items with quantities
    - modifications: list of special requests
    - drinks: list of drinks
    - delivery_type: "delivery", "pickup", or "not specified"
    """
    
    system_prompt = """TODO: Your system prompt here"""
    
    user_prompt = f"""TODO: Your user prompt here"""
    
    response = call_llm(system_prompt, user_prompt)
    
    # Parse the JSON response
    try:
        return json.loads(response)
    except json.JSONDecodeError:
        return {"error": "Failed to parse JSON"}

# ============================================================
# APPROACH 3: Structured Output with Pydantic
# ============================================================


# TODO
class FoodOrder(BaseModel):
    """
    # TODO: Define the structure for a food order.
    """
    pass

# TODO
def extract_with_structured_output(order):
    """
    Extract order using structured output.
    """
    
    system_prompt = """TODO: Your system prompt here"""
    
    user_prompt = f"""TODO: Your user prompt here"""
    
    return call_llm(system_prompt, user_prompt, response_format=FoodOrder) # <=== We are passing the FoodOrder pydantic model here


##########################################################
##########################################################
#     YOU DON'T NEED TO CHANGE ANY OF THE CODE BELOW     #
##########################################################
##########################################################

def compare_approaches():
    """Process all orders with each approach and compare results."""
    
    print("\n" + "="*60)
    print("COMPARING ALL APPROACHES")
    print("="*60)
    
    for i, order in enumerate(food_orders, 1):
        print(f"\n{'='*60}")
        print(f"ORDER {i}: {order}")
        print("-"*60)
        
        # Text output
        print("\n📝 Text Output:")
        print(extract_with_prompting(order))
        
        # JSON output
        print("\n📋 JSON Output:")
        json_result = extract_with_json(order)
        if isinstance(json_result, dict):
            print(json.dumps(json_result, indent=2))
        else:
            print(json_result)
        
        # Structured output
        print("\n🎯 Structured Output:")
        structured_result = extract_with_structured_output(order)
        if hasattr(structured_result, 'model_dump_json'):
            print(structured_result.model_dump_json(indent=2))
        else:
            print(structured_result)

# Run comparison
compare_approaches()

## Working code for Exercise 2: Structured Data Extraction

In [70]:
import json
from pydantic import BaseModel
from typing import List, Optional, Literal

# Sample food orders - all contain similar info in different formats
food_orders = [
    "2 pepperoni pizzas large, 1 greek salad, and 3 cokes for delivery to 123 Main St",
    "One burger medium with no onions, extra cheese, side of fries and a chocolate shake",
    "3x chicken tacos, 1x beef burrito no beans, 2 large horchatas, extra hot sauce please",
    "Pad Thai mild spice, Tom Yum soup, 2 spring rolls, and jasmine tea for pickup",
    "Large coffee with oat milk and 2 sugars, plus a blueberry muffin warmed up",
    "Family meal: 1 whole roast chicken, mashed potatoes, coleslaw, 4 dinner rolls"
]

# ============================================================
# LLM HELPER FUNCTION
# ============================================================

def call_llm(system_prompt, user_prompt, response_format=None):
    """
    Call the LLM with given prompts.
    
    Args:
        system_prompt: System message for the LLM
        user_prompt: User message for the LLM
        response_format: Optional Pydantic model for structured output
    
    Returns:
        String response or parsed object if response_format is provided
    """
    from openai import OpenAI
    client = OpenAI()
    
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
    
    if response_format:
        # Use structured output
        completion = client.chat.completions.parse(
            model="gpt-4o-mini",
            messages=messages,
            response_format=response_format
        )
        return completion.choices[0].message.parsed
    else:
        # Regular text output
        completion = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages
        )
        return completion.choices[0].message.content

def extract_with_prompting(order):
    """
    Extract order information using text prompts.
    
    TODO: Write prompts to extract:
    - Items ordered (with quantities)
    - Special modifications
    - Drinks
    - Delivery/pickup preference
    """
    
    system_prompt = """You are a food order parser. Extract information from food orders and format it clearly.
    Include: items with quantities, modifications, drinks, and delivery preference."""
    
    user_prompt = f"""Extract the following from this order:
    - Food items (with quantities)
    - Special modifications or requests
    - Drinks (if any)
    - Delivery or pickup preference (if mentioned)
    
    Order: {order}
    
    Format your response as:
    ITEMS: 
    MODIFICATIONS:
    DRINKS:
    DELIVERY TYPE:"""
    
    return call_llm(system_prompt, user_prompt)

# ============================================================
# APPROACH 2: JSON Prompting
# ============================================================

def extract_with_json(order):
    """
    Extract order information as JSON.
    
    TODO: Write prompts that return JSON with:
    - items: list of items with quantities
    - modifications: list of special requests
    - drinks: list of drinks
    - delivery_type: "delivery", "pickup", or "not specified"
    """
    
    system_prompt = """You are a food order parser. Extract information from food orders and return ONLY valid JSON.
    No additional text or explanation, just the JSON object."""
    
    user_prompt = f"""Extract information from this order and return as JSON with this structure:
    {{
        "items": ["list of food items with quantities"],
        "modifications": ["list of special requests or modifications"],
        "drinks": ["list of drinks ordered"],
        "delivery_type": "delivery, pickup, or not specified"
    }}
    
    Order: {order}"""
    
    response = call_llm(system_prompt, user_prompt)
    
    # Parse the JSON response
    try:
        return json.loads(response)
    except json.JSONDecodeError:
        return {"error": "Failed to parse JSON"}

# ============================================================
# APPROACH 3: Structured Output with Pydantic
# ============================================================

class OrderItem(BaseModel):
    item: str
    quantity: int

class FoodOrder(BaseModel):
    """
    Define the structure for a food order.
    """
    items: List[OrderItem]  # List of food items with quantities
    modifications: List[str]  # Special requests or modifications
    drinks: List[OrderItem]  # Drinks ordered
    delivery_type: Literal["delivery", "pickup", "not specified"]  # "delivery", "pickup", or "not specified"

def extract_with_structured_output(order):
    """
    Extract order using structured output.
    """
    
    system_prompt = """You are a food order parser. Extract information from food orders.
    Be precise with quantities and capture all special modifications."""
    
    user_prompt = f"""Extract all information from this food order:
    
    Order: {order}
    
    Note: For delivery_type, use "delivery", "pickup", or "not specified" if not mentioned."""
    
    return call_llm(system_prompt, user_prompt, response_format=FoodOrder)


##########################################################
##########################################################
#     YOU DON'T NEED TO CHANGE ANY OF THE CODE BELOW     #
##########################################################
##########################################################
def compare_approaches():
    """Process all orders with each approach and compare results."""
    
    print("\n" + "="*60)
    print("COMPARING ALL APPROACHES")
    print("="*60)
    
    for i, order in enumerate(food_orders, 1):
        print(f"\n{'='*60}")
        print(f"ORDER {i}: {order}")
        print("-"*60)
        
        # Text output
        print("\n📝 Text Output:")
        print(extract_with_prompting(order))
        
        # JSON output
        print("\n📋 JSON Output:")
        json_result = extract_with_json(order)
        if isinstance(json_result, dict):
            print(json.dumps(json_result, indent=2))
        else:
            print(json_result)
        
        # Structured output
        print("\n🎯 Structured Output:")
        structured_result = extract_with_structured_output(order)
        if hasattr(structured_result, 'model_dump_json'):
            print(structured_result.model_dump_json(indent=2))
        else:
            print(structured_result)

# Run comparison
compare_approaches()


COMPARING ALL APPROACHES

ORDER 1: 2 pepperoni pizzas large, 1 greek salad, and 3 cokes for delivery to 123 Main St
------------------------------------------------------------

📝 Text Output:
ITEMS: 
- 2 large pepperoni pizzas
- 1 greek salad

MODIFICATIONS: 
None specified

DRINKS: 
- 3 cokes

DELIVERY TYPE: 
Delivery to 123 Main St

📋 JSON Output:
{
  "items": [
    "2 large pepperoni pizzas",
    "1 greek salad"
  ],
  "modifications": [],
  "drinks": [
    "3 cokes"
  ],
  "delivery_type": "delivery"
}

🎯 Structured Output:
{
  "items": [
    {
      "item": "pepperoni pizza",
      "quantity": 2
    },
    {
      "item": "greek salad",
      "quantity": 1
    }
  ],
  "modifications": [],
  "drinks": [
    {
      "item": "coke",
      "quantity": 3
    }
  ],
  "delivery_type": "delivery"
}

ORDER 2: One burger medium with no onions, extra cheese, side of fries and a chocolate shake
------------------------------------------------------------

📝 Text Output:
ITEMS: One burger 

## Exercise 3: Tool Selection
Make an LLM select which function to call based on user input using structured outputs.

Your task:
1. **Define a Pydantic model** to output the selected function name
2. **Write prompts** that help the LLM choose the right function
3. **Test your routing** with various user inputs

Available functions: `get_time`, `get_weather`, `get_joke`, `get_fact`, `calculate`

Test inputs include:
- Direct requests ("What time is it?")
- Indirect requests ("Tell me something funny")
- Ambiguous inputs ("I'm bored")

*Just focus on the sections marked as **TODO** for this exercise, our focus is just on prompting for now. The function selected by the LLM is automatically run by the code given below* 

In [None]:
import json
from pydantic import BaseModel
from typing import Literal

# ============================================================
# AVAILABLE FUNCTIONS 
# ============================================================

def get_time():
    """Returns current time"""
    return "It's 2:30 PM"

def get_weather():
    """Returns weather information"""
    return "It's 72°F and sunny"

def get_joke():
    """Tells a funny joke"""
    return "Why do programmers prefer dark mode? Because light attracts bugs!"

def get_fact():
    """Returns an interesting fact"""
    return "Python was named after Monty Python, not the snake!"

def calculate():
    """Performs a calculation"""
    return "The answer is 42"

# Function registry (DO NOT MODIFY)
FUNCTIONS = {
    "get_time": get_time,
    "get_weather": get_weather,
    "get_joke": get_joke,
    "get_fact": get_fact,
    "calculate": calculate
}

# ============================================================
# YOUR TASK: Function Selection
# ============================================================

# TODO: Define a Pydantic model for function selection
class SelectedFunction(BaseModel):
    """
    TODO: Define what the LLM should output when selecting a function.
    What field(s) do you need?
    """
    pass  # Remove this and add your fields

def select_function(user_input):
    """
    Use an LLM to select which function should be called.
    
    TODO: 
    1. Write prompts to help the LLM understand the user's request
    2. Return a FunctionCall object with the selected function
    
    Think: How will the LLM know which functions are available?
    """
    
    # TODO: Write your system prompt
    system_prompt = """TODO: Your system prompt here"""
    
    # TODO: Write your user prompt
    user_prompt = f"""TODO: Your prompt here
    
    User input: {user_input}
    """
    
 
    selected_function = call_llm(system_prompt, user_prompt, SelectedFunction)
    return selected_function

# ============================================================
# TEST YOUR CODE
# ============================================================

def test_routing(user_input):
    """Tests your function selection and executes the selected function."""
    print(f"\n💬 Input: {user_input}")
    
    try:
        # Get function selection from your code
        result = select_function(user_input)
        
        # Extract function name (adapt based on your model)
        if hasattr(result, 'function_name'):
            func_name = result.function_name
        elif hasattr(result, 'function'):
            func_name = result.function
        elif hasattr(result, 'name'):
            func_name = result.name
        else:
            print("❌ Couldn't find function name in response")
            return
        
        # Execute the selected function
        if func_name in FUNCTIONS:
            output = FUNCTIONS[func_name]()
            print(f"✅ Called {func_name}() → {output}")
        else:
            print(f"❌ Unknown function: {func_name}")
            
    except Exception as e:
        print(f"❌ Error: {e}")

# Test cases
test_inputs = [
    "What time is it?",
    "Tell me something funny",
    "How's the weather?",
    "Calculate something for me",
    "Share an interesting fact",
    "I'm bored",  # Ambiguous
    "Help me with math",  # Should select calculate
    "Is it raining?",  # Weather-related
]

print("="*60)
print("Testing Function Routing")
print("="*60)

for user_input in test_inputs:
    test_routing(user_input)

## Working code for Exercise 3: Tool Selection

In [66]:
import json
from pydantic import BaseModel
from typing import Literal

# ============================================================
# AVAILABLE FUNCTIONS
# ============================================================

def get_time():
    """Returns current time"""
    return "It's 2:30 PM"

def get_weather():
    """Returns weather information"""
    return "It's 72°F and sunny"

def get_joke():
    """Tells a funny joke"""
    return "Why do programmers prefer dark mode? Because light attracts bugs!"

def get_fact():
    """Returns an interesting fact"""
    return "Python was named after Monty Python, not the snake!"

def calculate():
    """Performs a calculation"""
    return "The answer is 42"

FUNCTIONS = {
    "get_time": get_time,
    "get_weather": get_weather,
    "get_joke": get_joke,
    "get_fact": get_fact,
    "calculate": calculate
}

# ============================================================
# YOUR TASK: Function Selection
# ============================================================

class SelectedFunction(BaseModel):
    """Model for function selection output."""
    function_name: Literal["get_time", "get_weather", "get_joke", "get_fact", "calculate"]

def select_function(user_input):
    """
    Use an LLM to select which function should be called.
    """
    
    system_prompt = """You are a function router. Based on user input, select the most appropriate function to call.
    Be precise in matching user intent to function capabilities."""
    
    user_prompt = f"""Select the appropriate function for this user input.
    
    Available functions:
    - get_time: Returns the current time
    - get_weather: Returns weather information  
    - get_joke: Tells a funny joke
    - get_fact: Returns an interesting fact
    - calculate: Performs a calculation
    
    User input: {user_input}
    
    Select the most appropriate function based on what the user is asking for."""
    
    selected_function = call_llm(system_prompt, user_prompt, SelectedFunction)
    return selected_function

# ============================================================
# TEST YOUR FUNCTION SELECTION
# ============================================================

def test_routing(user_input):
    """Tests your function selection and executes the selected function."""
    print(f"\n💬 Input: {user_input}")
    
    try:
        # Get function selection from your code
        result = select_function(user_input)
        
        # Extract function name (adapt based on your model)
        if hasattr(result, 'function_name'):
            func_name = result.function_name
        elif hasattr(result, 'function'):
            func_name = result.function
        elif hasattr(result, 'name'):
            func_name = result.name
        else:
            print("❌ Couldn't find function name in response")
            return
        
        # Execute the selected function
        if func_name in FUNCTIONS:
            output = FUNCTIONS[func_name]()
            print(f"✅ Called {func_name}() → {output}")
        else:
            print(f"❌ Unknown function: {func_name}")
            
    except Exception as e:
        print(f"❌ Error: {e}")

# Test cases
test_inputs = [
    "What time is it?",
    "Tell me something funny",
    "How's the weather?",
    "Calculate something for me",
    "Share an interesting fact",
    "I'm bored",  # Ambiguous
    "Help me with math",  # Should select calculate
    "Is it raining?",  # Weather-related
]

print("="*60)
print("Testing Function Routing")
print("="*60)

for user_input in test_inputs:
    test_routing(user_input)

Testing Function Routing

💬 Input: What time is it?
✅ Called get_time() → It's 2:30 PM

💬 Input: Tell me something funny
✅ Called get_joke() → Why do programmers prefer dark mode? Because light attracts bugs!

💬 Input: How's the weather?
✅ Called get_weather() → It's 72°F and sunny

💬 Input: Calculate something for me
✅ Called calculate() → The answer is 42

💬 Input: Share an interesting fact
✅ Called get_fact() → Python was named after Monty Python, not the snake!

💬 Input: I'm bored
✅ Called get_joke() → Why do programmers prefer dark mode? Because light attracts bugs!

💬 Input: Help me with math
✅ Called calculate() → The answer is 42

💬 Input: Is it raining?
✅ Called get_weather() → It's 72°F and sunny


## 🤔 Food for thought after Exercise 3:

1. What information did you include in your prompts?
2. How did the LLM know which function to select?
3. What happened with ambiguous inputs like "I'm bored"?
4. Would this work if we had 100 functions instead of 5?
5. What if the functions have input parameters?

💡 These will be important for the next workshop when we start building our AI agent!

# Summary: From Prompts to Agents

## 🎯 Key Takeaways

### Part 1: Fundamental Techniques
1. **Zero-shot**: Quick but unreliable
2. **Few-shot**: Pattern teaching through examples
3. **Chain-of-Thought**: Step-by-step reasoning for better accuracy

### Part 2: Agent Capabilities
4. **System Prompts**: Define behavior and constraints
5. **Output Format Control**: Ensure parseable responses
6. **Error Handling**: Automatic retry and correction
7. **Tool Selection**: Automatically identify the right tool that needs to be called for a task

## 📝 Next Week

1. Introduction to Agents
2. Identify building blocks for Agents
3. Start building out the agent from scratch

## 📝 Useful Resources:

**[Andrej Karpathy's Neural Networks: Zero to Hero Playlist](https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&ab_channel=AndrejKarpathy)**:

Although the title says Neural Networks, it actually starts from basics of Neural Networks, and builds up all the way to Large Language Models, all from scratch. For those interested in the technical details of exactly how LLMs are built, this playlist is great!

**[Prompt Engineering Guide](https://www.promptingguide.ai/)**:

This site contains examples of the many prompt engineering techniques we covered today, along with additional ones that are often used when building agents (eg. ReAct)