# Notebook 5: Few-Shot Prompting

In this notebook, you'll learn to use examples to guide model behavior, improve accuracy, and enforce output formats.

## What You'll Learn

- Zero-shot vs few-shot prompting
- Two ways to provide examples
- When few-shot helps most
- How to choose good examples
- How many examples to use

## Reference

- [Mistral Prompting Documentation](https://docs.mistral.ai/guides/prompting/)

---
## Setup

In [None]:
%run 00_setup.ipynb

---
## Section 1: What is Few-Shot Prompting?

**Few-shot prompting** provides examples to teach the model what you want.

| Approach | Examples | Description |
|----------|----------|-------------|
| **Zero-shot** | 0 | Just instructions, no examples |
| **One-shot** | 1 | One example to demonstrate |
| **Few-shot** | 2-5 | Multiple examples for pattern learning |

Examples teach by demonstration‚Äîoften more effective than lengthy instructions.

In [None]:
# Zero-shot: No examples
zero_shot_prompt = """Classify the sentiment of this review as positive, negative, or neutral.

Review: The battery life is decent but the screen could be brighter."""

print("ZERO-SHOT:")
print("-" * 40)
response_zero = call_mistral(user_prompt=zero_shot_prompt)
print(response_zero)

print("\n" + "=" * 50 + "\n")

# Few-shot: With examples
few_shot_prompt = """Classify the sentiment of reviews as positive, negative, or neutral.

# Examples
Review: I love this product! Best purchase ever.
Sentiment: positive

Review: Completely broken on arrival. Waste of money.
Sentiment: negative

Review: It works as expected. Nothing special.
Sentiment: neutral

# Now classify this:
Review: The battery life is decent but the screen could be brighter.
Sentiment:"""

print("FEW-SHOT:")
print("-" * 40)
response_few = call_mistral(user_prompt=few_shot_prompt)
print(response_few)

---
## Section 2: Two Ways to Provide Examples

### Method 1: Inline Examples (in the prompt)

```
# Examples
Input: Hello, how are you?
Output: {"language": "en"}

Input: Bonjour!
Output: {"language": "fr"}

# Now process:
Input: Hola mundo
Output:
```

### Method 2: Conversation History (using assistant role)

```python
messages = [
    {"role": "system", "content": "Detect language and return JSON."},
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": '{"language": "en"}'},
    {"role": "user", "content": "Bonjour!"},
    {"role": "assistant", "content": '{"language": "fr"}'},
    {"role": "user", "content": "Hola mundo"}
]
```

In [None]:
# Method 1: Inline examples
inline_prompt = """Detect the language and return JSON.

# Examples
Input: Hello, how are you?
Output: {"language": "en"}

Input: Bonjour, comment √ßa va?
Output: {"language": "fr"}

# Now process:
Input: Guten Morgen!
Output:"""

print("METHOD 1 - INLINE EXAMPLES:")
print("-" * 40)
response_inline = call_mistral(user_prompt=inline_prompt, temperature=0)
print(response_inline)

In [None]:
# Method 2: Conversation history (few-shot with message roles)
messages = [
    {"role": "system", "content": "You are a language detector. Return only JSON with the detected language code."},
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": '{"language": "en"}'},
    {"role": "user", "content": "Bonjour, comment √ßa va?"},
    {"role": "assistant", "content": '{"language": "fr"}'},
    {"role": "user", "content": "Guten Morgen!"}
]

print("METHOD 2 - CONVERSATION HISTORY:")
print("-" * 40)
response_conv = call_mistral_with_messages(messages, temperature=0)
print(response_conv)

---
## Section 3: When Few-Shot Helps Most

Few-shot prompting is especially useful for:

| Use Case | Why It Helps |
|----------|-------------|
| **Enforcing output format** | Model learns exact structure from examples |
| **Classification with specific categories** | Examples define category boundaries |
| **Style/tone matching** | Demonstrates desired voice |
| **Handling edge cases** | Shows how to handle tricky inputs |
| **Domain-specific tasks** | Teaches specialized terminology/rules |

In [None]:
# Few-shot for format enforcement
format_prompt = """Extract entities from text and return structured data.

# Examples
Text: John Smith works at Google in New York.
Output:
- Person: John Smith
- Company: Google
- Location: New York

Text: Sarah joined Microsoft last year.
Output:
- Person: Sarah
- Company: Microsoft
- Location: (none)

# Now extract:
Text: The CEO of Amazon, Andy Jassy, announced the Seattle expansion.
Output:"""

response = call_mistral(user_prompt=format_prompt, temperature=0)
print(response)

In [None]:
# Few-shot for handling edge cases
edge_case_prompt = """Classify customer intent. Possible categories: billing, technical, general, escalate.

# Examples
Customer: I can't log into my account.
Intent: technical

Customer: Why was I charged twice this month?
Intent: billing

Customer: What are your business hours?
Intent: general

Customer: This is the third time I'm calling about the same issue. I want to speak to a manager NOW!
Intent: escalate

Customer: I've been waiting 2 weeks for a refund you promised. I'm recording this call.
Intent: escalate

# Now classify:
Customer: Your app crashes every time I try to upload a photo. I've already tried reinstalling.
Intent:"""

response = call_mistral(user_prompt=edge_case_prompt, temperature=0)
print(response)

---
## Section 4: Choosing Good Examples

Your examples should be:

1. **Representative** - Similar to real inputs you'll encounter
2. **Diverse** - Cover different categories/scenarios
3. **Include edge cases** - Show handling of tricky inputs
4. **Consistent** - Use the same format across all examples
5. **Correct** - Double-check your examples are accurate!

In [None]:
# Good examples: diverse, representative, edge cases included
good_examples_prompt = """Rate product reviews on a scale: positive, mixed, negative.

# Examples (covering different scenarios)

## Clear positive
Review: Absolutely love it! Exceeded all my expectations.
Rating: positive

## Clear negative
Review: Broke after one day. Total waste of money.
Rating: negative

## Mixed - positive aspects with caveats
Review: Great features but the battery life is disappointing.
Rating: mixed

## Mixed - criticism with silver lining
Review: Not what I expected, but customer service was helpful.
Rating: mixed

## Edge case - sarcasm
Review: Oh great, another product that doesn't work as advertised.
Rating: negative

# Now rate:
Review: It's okay for the price. You get what you pay for.
Rating:"""

response = call_mistral(user_prompt=good_examples_prompt, temperature=0)
print(response)

---
## Section 5: How Many Examples?

**General guidelines:**

| # Examples | Use Case |
|------------|----------|
| 1-2 | Simple format enforcement |
| 3-5 | Classification, most tasks |
| 5+ | Complex tasks, many categories |

**Trade-offs:**
- More examples = better pattern learning
- More examples = more tokens = higher cost/latency
- Diminishing returns after ~5 examples for most tasks

**Test to find your sweet spot!**

In [None]:
# Comparing 1-shot vs 3-shot
test_input = "The product arrived damaged but the replacement process was smooth."

# 1-shot
one_shot = f"""Classify sentiment as positive, negative, or mixed.

Example:
Text: Great product, highly recommend!
Sentiment: positive

Now classify:
Text: {test_input}
Sentiment:"""

print("1-SHOT:")
print(call_mistral(user_prompt=one_shot, temperature=0))

print("\n" + "=" * 50 + "\n")

# 3-shot
three_shot = f"""Classify sentiment as positive, negative, or mixed.

Examples:
Text: Great product, highly recommend!
Sentiment: positive

Text: Terrible quality, don't buy this.
Sentiment: negative

Text: Good features but overpriced.
Sentiment: mixed

Now classify:
Text: {test_input}
Sentiment:"""

print("3-SHOT:")
print(call_mistral(user_prompt=three_shot, temperature=0))

---
## Section 6: What NOT to Do (Negative Examples)

### ‚ùå Examples contradict instructions
```
Classify as positive or negative.

Example:
Text: It's okay
Sentiment: neutral   # ‚Üê "neutral" wasn't an option!
```

### ‚ùå Non-representative examples
```
Task: Classify customer support tickets

Example 1: "I love your product!" ‚Üí positive
Example 2: "Great service!" ‚Üí positive  
Example 3: "Amazing!" ‚Üí positive
# ‚Üê No negative or neutral examples!
```

### ‚ùå Inconsistent format
```
Input: "Hello" ‚Üí Output: greeting
"Goodbye" = farewell
Input "Thanks" ... gratitude
# ‚Üê Format changes each time
```

In [None]:
# Bad: examples contradict instructions
bad_prompt = """Classify as "short" or "long" based on word count.

Examples:
Text: Hello
Length: short

Text: This is a medium length sentence.
Length: medium

Text: {test}
Length:""".format(test="The quick brown fox.")

print("BAD - Example uses 'medium' but instructions only allow 'short' or 'long':")
print(call_mistral(user_prompt=bad_prompt, temperature=0))

print("\n" + "=" * 50 + "\n")

# Good: examples match instructions
good_prompt = """Classify as "short" (1-5 words) or "long" (6+ words).

Examples:
Text: Hello there
Length: short

Text: This is a longer sentence with many words.
Length: long

Text: The quick brown fox.
Length:"""

print("GOOD - Examples consistent with instructions:")
print(call_mistral(user_prompt=good_prompt, temperature=0))

---
## Exercise 1: Zero-Shot vs Few-Shot Comparison

Compare performance on a classification task with varying numbers of examples.

In [None]:
# Task: Classify programming questions by difficulty level
test_questions = [
    "How do I print 'hello world' in Python?",
    "What's the difference between a list and a tuple?",
    "How do I implement a red-black tree with lazy deletion?"
]

# Zero-shot
print("ZERO-SHOT:")
print("-" * 40)
for q in test_questions:
    prompt = f"""Classify this programming question as 'beginner', 'intermediate', or 'advanced'.
    
Question: {q}
Difficulty:"""
    response = call_mistral(user_prompt=prompt, temperature=0)
    print(f"Q: {q[:50]}...")
    print(f"A: {response}\n")

In [None]:
# TODO: Create a few-shot version with 3 examples
# Then run the same test questions and compare results

few_shot_base = """Classify programming questions as 'beginner', 'intermediate', or 'advanced'.

# Examples
Question: How do I create a variable in JavaScript?
Difficulty: beginner

# Add more examples here...

# Now classify:
Question: {question}
Difficulty:"""

# Uncomment to test
# print("FEW-SHOT:")
# print("-" * 40)
# for q in test_questions:
#     prompt = few_shot_base.format(question=q)
#     response = call_mistral(user_prompt=prompt, temperature=0)
#     print(f"Q: {q[:50]}...")
#     print(f"A: {response}\n")

---
## Exercise 2: Format Enforcement

Use few-shot to enforce a specific JSON output format.

In [None]:
# Goal: Extract contact info and return as JSON
# Format: {"name": "...", "email": "...", "phone": "..."}

test_texts = [
    "Contact John at john@email.com or call 555-1234",
    "Reach out to Sarah (sarah.jones@company.org)",
    "For help, email support@help.com"
]

# TODO: Create a few-shot prompt that enforces the JSON format
# Include examples that show how to handle missing fields

format_prompt = """Extract contact information and return as JSON.

# Examples
Text: Email mike@test.com for more info.
Output: {"name": null, "email": "mike@test.com", "phone": null}

# Add more examples...

Text: {text}
Output:"""

# Uncomment to test
# for text in test_texts:
#     prompt = format_prompt.format(text=text)
#     response = call_mistral(user_prompt=prompt, temperature=0)
#     print(f"Input: {text}")
#     print(f"Output: {response}\n")

---
## Exercise 3: Handling Edge Cases

Create examples that handle tricky edge cases for sentiment analysis.

In [None]:
# Edge cases to handle:
# - Sarcasm ("Oh great, another broken product")
# - Backhanded compliments ("It's good for the price")
# - Questions ("Is this worth buying?")
# - Mixed signals ("Love the design, hate the battery")

edge_cases = [
    "Oh wonderful, it broke on the first day, just what I wanted.",
    "Not bad for something so cheap.",
    "Should I return this or give it another chance?",
    "Amazing camera but the app is garbage."
]

# TODO: Create a few-shot prompt with examples for each edge case type
edge_case_prompt = """Classify review sentiment as positive, negative, or mixed.

# Examples (including edge cases)

# Sarcasm - detect it!
Review: Oh sure, because everyone loves a phone that dies in 2 hours.
Sentiment: negative

# Add more edge case examples...

Review: {review}
Sentiment:"""

# Uncomment to test
# for review in edge_cases:
#     prompt = edge_case_prompt.format(review=review)
#     response = call_mistral(user_prompt=prompt, temperature=0)
#     print(f"Review: {review}")
#     print(f"Sentiment: {response}\n")

---
## Key Takeaways

1. **Few-shot is powerful** for format and behavior control

2. **Two methods**: Inline examples or conversation history

3. **Choose examples carefully** - They're teaching material:
   - Representative
   - Diverse
   - Include edge cases
   - Consistent format

4. **Balance coverage vs token cost** - Usually 3-5 examples is enough

5. **Examples must match instructions** - Don't contradict yourself

---

## Next Steps

Now that you can guide behavior with examples, let's learn how to control output format more precisely.

üìö [Continue to Notebook 6: Controlling Output Format ‚Üí](06_controlling_output_format.ipynb)