# Notebook 6: Controlling Output Format

In this notebook, you'll learn to get predictable, parseable responses from Mistral models.

## What You'll Learn

- Requesting specific formats (JSON, lists, tables)
- Strengthening format compliance
- Worded scales vs numeric scales (Mistral-specific)
- Handling complex nested structures

## Reference

- [Mistral Prompting Documentation](https://docs.mistral.ai/guides/prompting/)
- [Mistral Structured Outputs](https://docs.mistral.ai/capabilities/structured-outputs/)

---
## Setup

In [None]:
%run 00_setup.ipynb

---
## Section 1: Why Output Format Matters

In production systems, you need:
- **Predictable structure** for downstream processing
- **Parseable output** that code can consume
- **Consistent format** across all requests

Free-form text is hard to parse reliably. Structured output (JSON, specific formats) makes integration much easier.

In [None]:
# Without format specification - unpredictable
unformatted_prompt = """Extract the person's name, age, and city from this text:
John Smith is 32 years old and lives in Boston."""

print("WITHOUT FORMAT SPECIFICATION:")
print("-" * 40)
for i in range(3):
    response = call_mistral(user_prompt=unformatted_prompt, temperature=0.7)
    print(f"Run {i+1}: {response}\n")

In [None]:
# With format specification - predictable
formatted_prompt = """Extract the person's name, age, and city from this text.
Return as JSON with keys: name, age, city.

Text: John Smith is 32 years old and lives in Boston.

JSON:"""

print("WITH FORMAT SPECIFICATION:")
print("-" * 40)
for i in range(3):
    response = call_mistral(user_prompt=formatted_prompt, temperature=0.7)
    print(f"Run {i+1}: {response}\n")

---
## Section 2: Requesting Formats in Plain Language

You can request common formats with simple instructions:

| Format | Instruction |
|--------|-------------|
| JSON | "Return as JSON with keys: x, y, z" |
| Bulleted list | "Return as a bulleted list" |
| Numbered list | "Return as a numbered list" |
| Table | "Format as a markdown table" |
| Single word | "Respond with a single word" |

For simple cases, this works well. For reliability, show the exact format.

In [None]:
# Different format requests
data = "Products: Widget ($10, 100 in stock), Gadget ($25, 50 in stock), Gizmo ($15, 75 in stock)"

formats = {
    "JSON": "Return as a JSON array of objects with keys: name, price, stock.",
    "Bulleted list": "Return as a bulleted list with format: - Name: $Price (Stock units)",
    "Markdown table": "Return as a markdown table with columns: Name, Price, Stock."
}

for format_name, instruction in formats.items():
    prompt = f"{instruction}\n\nData: {data}"
    print(f"FORMAT: {format_name}")
    print("-" * 40)
    response = call_mistral(user_prompt=prompt, temperature=0)
    print(response)
    print("\n" + "=" * 50 + "\n")

---
## Section 3: Strengthening Format Compliance

For more reliable format compliance:

1. **Show the exact format** you want
2. **Use few-shot examples** (from Notebook 5)
3. **Be explicit about structure** (field types, required vs optional)
4. **End with format cue** (e.g., "JSON:" at the end)

In [None]:
# Showing exact format
exact_format_prompt = """Analyze the sentiment of the review.

Return your response in this exact JSON format:
{
    "sentiment": "positive" | "negative" | "neutral",
    "confidence": "high" | "medium" | "low",
    "key_phrases": ["phrase1", "phrase2"]
}

Review: The product quality is excellent, though shipping took longer than expected.

JSON:"""

response = call_mistral(user_prompt=exact_format_prompt, temperature=0)
print(response)

# Try to parse it
import json
try:
    parsed = json.loads(response)
    print("\n‚úÖ Successfully parsed as JSON!")
    print(f"Sentiment: {parsed.get('sentiment')}")
except json.JSONDecodeError as e:
    print(f"\n‚ùå Failed to parse: {e}")

In [None]:
# Combining with few-shot for reliability
robust_format_prompt = """Analyze review sentiment and return JSON.

# Examples
Review: I love it!
JSON: {"sentiment": "positive", "confidence": "high", "key_phrases": ["love it"]}

Review: Worst purchase ever.
JSON: {"sentiment": "negative", "confidence": "high", "key_phrases": ["worst purchase"]}

Review: It's fine, nothing special.
JSON: {"sentiment": "neutral", "confidence": "medium", "key_phrases": ["fine", "nothing special"]}

# Now analyze:
Review: Great features but the price is a bit steep for what you get.
JSON:"""

response = call_mistral(user_prompt=robust_format_prompt, temperature=0)
print(response)

---
## Section 4: Worded Scales vs Numeric Scales (Mistral-Specific)

From Mistral's documentation:

> If you need a model to rate something, use a **worded scale** for better performance.

**‚ùå Numeric scale:**
```
Rate from 1 to 5, where 1 is bad and 5 is excellent.
```

**‚úÖ Worded scale:**
```
Rate using:
- Very Low: highly irrelevant
- Low: not good enough
- Neutral: acceptable
- Good: worth considering
- Very Good: highly relevant
```

You can convert to numeric later if needed.

In [None]:
# Comparing numeric vs worded scales
test_items = [
    "A comprehensive guide to machine learning with code examples",
    "A recipe for chocolate cake",
    "A brief mention of AI in a gardening article"
]

# Numeric scale
print("NUMERIC SCALE (1-5):")
print("-" * 40)
for item in test_items:
    prompt = f"""Rate how relevant this content is to someone learning about AI.
Rate from 1-5 where 1=not relevant, 5=highly relevant.

Content: {item}
Rating (1-5):"""
    response = call_mistral(user_prompt=prompt, temperature=0)
    print(f"Content: {item[:50]}...")
    print(f"Rating: {response}\n")

In [None]:
# Worded scale (Mistral-recommended)
print("WORDED SCALE:")
print("-" * 40)
for item in test_items:
    prompt = f"""Rate how relevant this content is to someone learning about AI.

Use this scale:
- Very Low: completely irrelevant to AI learning
- Low: barely touches on AI topics
- Neutral: somewhat related to AI
- Good: useful for AI learners
- Very Good: highly valuable for AI education

Content: {item}
Relevance:"""
    response = call_mistral(user_prompt=prompt, temperature=0)
    print(f"Content: {item[:50]}...")
    print(f"Rating: {response}\n")

In [None]:
# Converting worded scale to numeric if needed
scale_mapping = {
    "Very Low": 1,
    "Low": 2,
    "Neutral": 3,
    "Good": 4,
    "Very Good": 5
}

worded_response = "Good"  # Example response
numeric_value = scale_mapping.get(worded_response, 0)
print(f"Worded: {worded_response} ‚Üí Numeric: {numeric_value}")

---
## Section 5: Complex Nested Structures

For complex JSON with nested objects and arrays, be very explicit about the structure.

In [None]:
# Complex nested structure
complex_format_prompt = """Extract information from the text and return as structured JSON.

# Output Schema
{
    "company": {
        "name": "string",
        "founded": "number or null",
        "headquarters": "string or null"
    },
    "products": [
        {
            "name": "string",
            "category": "string",
            "price": "number or null"
        }
    ],
    "key_people": [
        {
            "name": "string",
            "role": "string"
        }
    ]
}

# Text
TechCorp, founded in 2010 in San Francisco, is led by CEO Jane Doe and CTO Bob Smith. 
Their flagship products include the Widget Pro ($299) in the hardware category and 
CloudSync (subscription service) in the software category.

# JSON Output
"""

response = call_mistral(user_prompt=complex_format_prompt, temperature=0)
print(response)

# Validate
try:
    parsed = json.loads(response)
    print("\n‚úÖ Valid JSON with nested structure!")
    print(f"Company: {parsed.get('company', {}).get('name')}")
    print(f"Products: {len(parsed.get('products', []))}")
    print(f"Key People: {len(parsed.get('key_people', []))}")
except json.JSONDecodeError as e:
    print(f"\n‚ùå Parse error: {e}")

---
## Section 6: Handling Missing/Optional Fields

Explicitly tell the model what to do when data is missing.

In [None]:
# Handling missing fields
missing_field_prompt = """Extract contact information from the text.

# Output Format
{
    "name": "string",
    "email": "string or null if not found",
    "phone": "string or null if not found",
    "company": "string or null if not found"
}

IMPORTANT: If a field is not mentioned in the text, use null (not "N/A" or "unknown").

# Text
You can reach Sarah at sarah@example.com for any questions.

# JSON
"""

response = call_mistral(user_prompt=missing_field_prompt, temperature=0)
print(response)

---
## Section 7: What NOT to Do (Negative Examples)

### ‚ùå Vague format request
```
Give me the data in a nice format.
```

### ‚ùå Conflicting format cues
```
Return as JSON. Use bullet points for each item.
```

### ‚ùå No example of desired structure
```
Return a JSON object with the relevant fields.
```
What fields? What types?

### ‚ùå Numeric scale for subjective ratings
```
Rate how good this essay is from 1 to 10.
```

In [None]:
# Bad vs Good format requests
text = "Meeting with John at 3pm tomorrow to discuss the Q4 budget."

# Bad: vague
print("BAD - Vague request:")
print("-" * 40)
bad_prompt = f"Extract the important information from this: {text}"
print(call_mistral(user_prompt=bad_prompt))

print("\n" + "=" * 50 + "\n")

# Good: explicit structure
print("GOOD - Explicit structure:")
print("-" * 40)
good_prompt = f"""Extract meeting details from the text.

Return JSON with this structure:
{{
    "attendees": ["list of names"],
    "time": "time string",
    "date": "date string (relative dates like 'tomorrow' are OK)",
    "topic": "meeting topic"
}}

Text: {text}

JSON:"""
print(call_mistral(user_prompt=good_prompt))

---
## Exercise 1: Free-Form to Structured

Convert a free-form text extraction task to return parseable JSON.

In [None]:
# Task: Extract product information
product_descriptions = [
    "The UltraPhone X features a 6.5 inch display, 128GB storage, and costs $899.",
    "Introducing CloudBook Air - our lightest laptop yet at just 2.5 pounds. Starting at $1299 with 256GB SSD.",
    "SoundPods Pro wireless earbuds offer 8 hours of battery life and active noise cancellation for $199."
]

# TODO: Create a prompt that extracts product info as JSON
# Fields: name, category, price, key_features (array)

extract_prompt = """Extract product information from the description.

Return JSON with structure:
{{
    "name": "product name",
    "category": "phone/laptop/audio/other",
    "price": number,
    "key_features": ["feature1", "feature2"]
}}

Description: {description}

JSON:"""

for desc in product_descriptions:
    prompt = extract_prompt.format(description=desc)
    response = call_mistral(user_prompt=prompt, temperature=0)
    print(f"Input: {desc[:50]}...")
    print(f"Output: {response}")
    print("-" * 50)

---
## Exercise 2: Worded vs Numeric Scale Comparison

Build a rating task and compare both approaches.

In [None]:
# Task: Rate job candidate fit
candidates = [
    "10 years experience, perfect skill match, excellent communication",
    "Entry level, some relevant coursework, eager to learn",
    "5 years experience, missing one key skill, good references"
]

job_requirements = "Senior Python developer with Django experience"

# TODO: Create two versions - numeric (1-5) and worded scale
# Compare the outputs

# Numeric version
numeric_prompt = """Rate how well this candidate fits the job.
Job: {job}
Candidate: {candidate}

Rate 1-5 where 1=poor fit, 5=excellent fit.
Rating:"""

# Worded version
worded_prompt = """Rate how well this candidate fits the job.
Job: {job}
Candidate: {candidate}

Rate using:
- Poor Fit: missing most requirements
- Below Average: missing several key requirements
- Average: meets some requirements
- Good Fit: meets most requirements
- Excellent Fit: exceeds requirements

Rating:"""

print("Comparing approaches:\n")
for candidate in candidates:
    print(f"Candidate: {candidate[:40]}...")
    
    num_response = call_mistral(
        user_prompt=numeric_prompt.format(job=job_requirements, candidate=candidate),
        temperature=0
    )
    word_response = call_mistral(
        user_prompt=worded_prompt.format(job=job_requirements, candidate=candidate),
        temperature=0
    )
    
    print(f"  Numeric: {num_response}")
    print(f"  Worded: {word_response}")
    print()

---
## Exercise 3: Complex Nested Structure

Design a JSON schema for a complex extraction task.

In [None]:
# Task: Extract structured data from a news article summary
article = """
Tech giant Nexus Corp announced today that CEO Maria Garcia will step down after 8 years.
The San Francisco-based company also reported Q3 revenue of $5.2 billion, up 15% year-over-year.
The board has appointed interim CEO James Liu, former COO, effective December 1st.
Analyst Sarah Chen from Goldman Sachs rated the stock as "Buy" with a price target of $450.
"""

# TODO: Design a JSON schema and create the extraction prompt
# Should include:
# - company info (name, location)
# - leadership changes (array of {person, old_role, new_role, effective_date})
# - financial data (revenue, growth)
# - analyst opinions (array of {analyst, firm, rating, target})

extraction_prompt = """
# Your schema and prompt here
"""

# Uncomment to test
# response = call_mistral(user_prompt=extraction_prompt, temperature=0)
# print(response)

---
## Key Takeaways

1. **Be explicit** - Show the exact format you want

2. **Use worded scales** for subjective ratings (Mistral-specific recommendation)

3. **Combine techniques** - Few-shot + explicit schema = most reliable

4. **Handle missing data explicitly** - Tell model to use null, not "N/A"

5. **End with format cue** - "JSON:" at the end helps consistency

---

## Next Steps

Now that you can control output format, let's learn how to improve accuracy on complex tasks with step-by-step reasoning.

üìö [Continue to Notebook 7: Step-by-Step Reasoning ‚Üí](07_step_by_step_reasoning.ipynb)