# <font color='blue'> Sec 8. Prompt Refinement </font>

####  <font color='red'> Content: Matching Recipes to Dietary Restrictions </font>
In this hands-on exercise, you will <font color='red'> refine a prompt that instructs an LLM to **read a recipe and a list of dietary restrictions, then categorize each restriction** </font> as `satisfied`, `not satisfied`, or `undeterminable` based on the information provided.

### Outline:

    0. Setup
    1. Sample Recipes and Dietary Restrictions
    2. Initial Prompt and Evaluation
    3. Prompt Component Analysis
    4. Prompt Refinement Iterations
    5. Testing with Another Recipe
    6. Comparison


## 0. Setup

In [1]:
from openai import OpenAI
from IPython.display import Markdown, display
import os

import json

with open('api_key.json', 'r') as file:
    api_keys = json.load(file)
    openai_key = api_keys["openai"]

# print(openai_key)
client = OpenAI(api_key=openai_key)

from utils import get_completion
from utils_prompt_refine import (
    dietary_restrictions,
    format_prompt,
)

## 1. Sample Recipes and Dietary Restrictions

Let's define a few sample recipes and dietary restrictions to test our prompts.

In [2]:
# Define sample recipes
sample_recipes = [
    {
        "name": "Classic Spaghetti Bolognese",
        "ingredients": [
            "1 lb ground beef",
            "1 onion, finely chopped",
            "2 garlic cloves, minced",
            "1 carrot, finely diced",
            "1 celery stalk, finely diced",
            "1 can (14 oz) crushed tomatoes",
            "2 tbsp tomato paste",
            "1 cup beef broth",
            "1 tsp dried oregano",
            "1 bay leaf",
            "1 lb spaghetti",
            "2 tbsp olive oil",
            "Salt and pepper to taste",
            "Grated Parmesan cheese for serving",
        ],
        "instructions": [
            "Heat olive oil in a large pot over medium heat.",
            "Add onion, garlic, carrot, and celery. Cook until softened, about 5 minutes.",
            "Add ground beef and cook until browned, breaking it up as it cooks.",
            "Stir in tomato paste and cook for 1 minute.",
            "Add crushed tomatoes, beef broth, oregano, bay leaf, salt, and pepper.",
            "Bring to a simmer, then reduce heat to low and cook for 1-2 hours.",
            "Cook spaghetti according to package instructions until al dente.",
            "Drain pasta and serve topped with the Bolognese sauce.",
            "Sprinkle with grated Parmesan cheese.",
        ],
    },
    {
        "name": "Vegetable Stir Fry",
        "ingredients": [
            "2 cups mixed vegetables (bell peppers, broccoli, carrots, snap peas)",
            "1 block firm tofu, cubed",
            "2 tbsp vegetable oil",
            "2 cloves garlic, minced",
            "1 tsp ginger, grated",
            "3 tbsp soy sauce",
            "1 tbsp rice vinegar",
            "1 tsp sesame oil",
            "1 tsp cornstarch",
            "2 green onions, sliced",
            "Sesame seeds for garnish",
            "Cooked rice for serving",
        ],
        "instructions": [
            "Press tofu to remove excess water, then cut into cubes.",
            "Mix soy sauce, rice vinegar, sesame oil, and cornstarch in a small bowl.",
            "Heat vegetable oil in a wok or large skillet over high heat.",
            "Add tofu and cook until golden, about 3-4 minutes. Remove and set aside.",
            "Add garlic and ginger to the wok and stir for 30 seconds.",
            "Add vegetables and stir-fry for 4-5 minutes until crisp-tender.",
            "Return tofu to the wok, add sauce mixture, and cook for 1-2 minutes until sauce thickens.",
            "Garnish with green onions and sesame seeds.",
            "Serve over rice.",
        ],
    },
    {
        "name": "Chocolate Chip Cookies",
        "ingredients": [
            "2 1/4 cups all-purpose flour",
            "1 tsp baking soda",
            "1 tsp salt",
            "1 cup (2 sticks) butter, softened",
            "3/4 cup granulated sugar",
            "3/4 cup packed brown sugar",
            "2 large eggs",
            "2 tsp vanilla extract",
            "2 cups semi-sweet chocolate chips",
            "1 cup chopped nuts (optional)",
        ],
        "instructions": [
            "Preheat oven to 375°F (190°C).",
            "Combine flour, baking soda, and salt in a small bowl.",
            "Beat butter, granulated sugar, and brown sugar in a large mixer bowl.",
            "Add eggs one at a time, beating well after each addition.",
            "Beat in vanilla extract.",
            "Gradually beat in flour mixture.",
            "Stir in chocolate chips and nuts if using.",
            "Drop by rounded tablespoon onto ungreased baking sheets.",
            "Bake for 9 to 11 minutes or until golden brown.",
            "Cool on baking sheets for 2 minutes; remove to wire racks to cool completely.",
        ],
    },
]


## 2. Initial Prompt and Evaluation

Let's create our initial (basic) prompt and evaluate its performance.

In [3]:
# Define our initial prompt

initial_prompt = """
Analyze the following recipe and determine whether it satisfies each dietary restriction in the list.
For each restriction, classify it as "satisfied", "not satisfied", or "undeterminable" based on the recipe information.

Recipe: {{recipe_name}}

Ingredients:
{{recipe_ingredients}}

Instructions:
{{recipe_instructions}}

Dietary Restrictions to Check:
{{dietary_restrictions}}

Please provide your response in JSON format.
"""

In [4]:
# Test the initial prompt with the spaghetti bolognese recipe
# No changes needed in this cell

test_recipe = sample_recipes[0]  # Spaghetti Bolognese
formatted_prompt = format_prompt(test_recipe, initial_prompt)
initial_response = get_completion(client, user_prompt=formatted_prompt)

print(f"Initial prompt response for {test_recipe['name']}:\n{initial_response}\n")

Initial prompt response for Classic Spaghetti Bolognese:
{
  "vegetarian": "not satisfied",
  "vegan": "not satisfied",
  "gluten-free": "not satisfied",
  "dairy-free": "not satisfied",
  "nut-free": "satisfied",
  "egg-free": "satisfied",
  "low-sodium": "not satisfied",
  "keto": "not satisfied",
  "paleo": "not satisfied",
  "kosher": "undeterminable"
}



## 3. Prompt Component Analysis

Let's analyze different components of our prompt to identify areas for improvement:

1. **Role**: No specific role was provided in our initial prompt.
2. **Task**: Basic instruction to analyze and classify dietary restrictions.
3. **Output Format**: Requested JSON format, but without a clear structure.
4. **Examples**: None provided.
5. **Context**: No additional context on dietary restrictions or assumptions to make.

### Initial Analysis of Problems:

While the response appears valid in format, several potential issues exist:

1. Lack of explanation for categorizations. Let's update the output format to include explanations.
2. Potential misinterpretation of ingredients. Let's add more details about ingredients to the context.
3. Unclear handling of ambiguities. Let's be more clear in the instructions.
4. No explanation for "undeterminable" items. Let's update the output format to include explanations.

## 4. Prompt Refinement Iterations

Let's refine our prompt based on the identified issues. We'll make several iterations to improve the prompt in the following aspects:


1. **Role**: No specific role $ \to $ dietary consultant specializing in food allergies and dietary restrictions.
3. **Task**: Basic instruction to analyze and classify dietary restrictions $\to$ detailed reasoning and the identification of critical ingredients.
4. **Output Format**: Requested JSON format, but without a clear structure $\to$ detailed classification, explanation and the critical ingredients.
5. **Examples**: None provided $\to$ a few-shot example.
6. **Context**: No additional context on dietary restrictions or assumptions to make $\to$ detailed dietary restrictions.


In [5]:
# Iteration 1: Adding role, context, and clarifying the task

refined_prompt_1 = """
You are a dietary consultant specializing in food allergies and dietary restrictions.

Your task is to analyze the following recipe and determine whether it satisfies each dietary restriction in the list.
For each restriction, classify it as "satisfied," "not satisfied," or "undeterminable" based on the recipe information.

Important context and definitions for dietary restrictions:
- Vegetarian: No meat, poultry, fish, or seafood. May include eggs and dairy.
- Vegan: Diet excludes all animal products, including meat, dairy, eggs, honey, and gelatin.
- Gluten-free: diet not contain gluten, a protein found in wheat, rye, and barley. 
- Dairy-free: Food contains no animal milk or milk-derived ingredients, excluding products from cows, sheep, goats, and other mammals.
- Nut-free: Diet is free from both tree nuts and peanuts, which are legumes.
- Egg-free: Food or recipe does not contain any eggs or egg products.
- Low-sodium: In food, the sodium level in the blood is below normal.
- Keto: a high-fat, low-carbohydrate diet that aims to put the body into a state called ketosis.
- Paleo: a dietary pattern based on the foods that early humans, during the Paleolithic era.
- Kosher: foods, drinks, or items that are fit to eat or use under Jewish dietary laws, known as Kashrut, which dictate what is permissible and how it must be prepared.

Guidelines for your analysis:
- Mark a restriction as "satisfied" only if you are certain the recipe meets it.
- Mark a restriction as "not satisfied" if you are certain the recipe meets it.
- Mark a restriction as "undeterminable" if you are not certain if the receipe meets it or not.
- For each classification, briefly explain how you gave the restriction level.

Recipe: {{ recipe_name }}

Ingredients:
{{ recipe_ingredients }}

Instructions:
{{ recipe_instructions }}

Dietary Restrictions to Check:
{{ dietary_restrictions }}

Please format your response as a JSON object where:
- Each key is the name of a dietary restriction
- Each value is an object with properties:
  - "classification": "satisfied", "not satisfied", or "undeterminable"
  - "explanation": brief reasoning for your classification
  - "critical_ingredients": array of ingredients that determined your classification
"""

In [6]:
# Test refined prompt 1 with the same recipe

formatted_prompt = format_prompt(test_recipe, refined_prompt_1)
refined_response_1 = get_completion(client, user_prompt=formatted_prompt)

print(f"Iteration 1 response for Spaghetti Bolognese:\n{refined_response_1}\n")

Iteration 1 response for Spaghetti Bolognese:
{
  "vegetarian": {
    "classification": "not satisfied",
    "explanation": "Contains ground beef, which is meat, so it does not meet vegetarian criteria.",
    "critical_ingredients": ["ground beef"]
  },
  "vegan": {
    "classification": "not satisfied",
    "explanation": "Contains ground beef, Parmesan cheese, and possibly other animal-derived ingredients, violating vegan restrictions.",
    "critical_ingredients": ["ground beef", "Parmesan cheese"]
  },
  "gluten-free": {
    "classification": "not satisfied",
    "explanation": "Contains spaghetti, which is typically made from wheat, a gluten source.",
    "critical_ingredients": ["spaghetti"]
  },
  "dairy-free": {
    "classification": "not satisfied",
    "explanation": "Includes grated Parmesan cheese, which is dairy.",
    "critical_ingredients": ["Parmesan cheese"]
  },
  "nut-free": {
    "classification": "undeterminable",
    "explanation": "The recipe does not specify whe

### Analysis of First Iteration

The first refined prompt has shown significant improvement:
1. Added clear definitions for each dietary restriction
2. Provided guidelines for determining classifications
3. Requested explanations and critical ingredients
4. Specified a more detailed output format

Improvements in the response:
* Clear explanations for each classification
* Identification of specific ingredients that affect the decision

Let's make one more refinement to address potential ambiguities and add an example.

In [7]:
# Iteration 2: Adding an example and more guidance on ambiguities

refined_prompt_2 = """
You are a dietary consultant specializing in food allergies and dietary restrictions.

Your task is to analyze the following recipe and determine whether it satisfies each dietary restriction in the list.
For each restriction, classify it as "satisfied," "not satisfied," or "undeterminable" based on the recipe information.

Important context and definitions for dietary restrictions:
- Vegetarian: No meat, poultry, fish, or seafood. May include eggs and dairy.
- Vegan: No animal products whatsoever, including meat, dairy, eggs, honey.
- Gluten-free: No wheat, barley, rye, or derivatives. Note that regular all-purpose flour contains gluten.
- Dairy-free: No milk, cheese, butter, cream, or other dairy products.
- Nut-free: No tree nuts or peanuts.
- Egg-free: No eggs or products containing eggs.
- Low-sodium: Limited salt and naturally high-sodium ingredients.
- Keto: Very low carbohydrate, high fat, moderate protein.
- Paleo: No grains, legumes, dairy, refined sugar, or processed foods.
- Kosher: Follows Jewish dietary laws (no pork, shellfish, mixing meat and dairy, etc.).

Guidelines for your analysis:
- Mark a restriction as "satisfied" only if you are certain the recipe meets it.
- Mark a restriction as "not satisfied" if any ingredient clearly violates it.
- Mark a restriction as "undeterminable" if you lack sufficient information (e.g., exact type of broth, potential cross-contamination).
- For each classification, briefly explain your reasoning and identify specific ingredients that affect your decision.

Handling ambiguities:
- For "vegetable oil" or unspecified oil, consider it plant-based unless otherwise noted.
- For tofu, here we meant the tofu made by soy beans, purely plant. <--- Think of other ingredients that might be ambiguous
- ********** <--- Consider how to handle cross-contamination
- etc

Example analysis for a simple recipe:

```
Recipe: Basic Pancakes
Ingredients:
- 1 cup all-purpose flour
- 2 tbsp sugar
- 1 tsp baking powder
- 1/2 tsp salt
- 1 egg
- 1 cup milk
- 2 tbsp butter, melted

Response:
{
  "vegetarian": {
    "classification": "satisfied",
    "explanation": "All ingredients are vegetarian; contains no meat, poultry, fish, or seafood.",
    "critical_ingredients": []
  },
  "vegan": {
    "classification": "not satisfied",
    "explanation": "Contains animal products.",
    "critical_ingredients": ["1 egg", "1 cup milk", "2 tbsp butter, melted"]
  },
  "gluten-free": {
    "classification": "not satisfied",
    "explanation": "Contains all-purpose flour which contains gluten.",
    "critical_ingredients": ["1 cup all-purpose flour"]
  }
}
```

Recipe to analyze: {{ recipe_name }}

Ingredients:
{{ recipe_ingredients }}

Instructions:
{{ recipe_instructions }}

Dietary Restrictions to Check:
{{ dietary_restrictions }}

Please format your response as a JSON object where:
- Each key is the name of a dietary restriction
- Each value is an object with properties:
  - "classification": "satisfied", "not satisfied", or "undeterminable"
  - "explanation": brief reasoning for your classification
  - "critical_ingredients": array of ingredients that determined your classification
"""

In [8]:
# Test refined prompt 2 with the vegetable stir fry recipe

test_recipe_2 = sample_recipes[1]  # Vegetable Stir Fry
formatted_prompt = format_prompt(test_recipe_2, refined_prompt_2)
refined_response_2 = get_completion(client, user_prompt=formatted_prompt)

print(f"Iteration 2 response for Vegetable Stir Fry:\n{refined_response_2}\n")

Iteration 2 response for Vegetable Stir Fry:
{
  "vegetarian": {
    "classification": "satisfied",
    "explanation": "All ingredients are plant-based or plant-derived; no meat, poultry, fish, or seafood are present.",
    "critical_ingredients": []
  },
  "vegan": {
    "classification": "not satisfied",
    "explanation": "Contains soy sauce and possibly rice vinegar, which may contain traces of animal-derived additives or be processed with animal products; also, the presence of tofu is generally vegan, but soy-based products can sometimes be processed with animal-derived ingredients. Additionally, sesame oil and seeds are plant-based, but cross-contamination is possible.",
    "critical_ingredients": ["soy sauce", "rice vinegar"]
  },
  "gluten-free": {
    "classification": "not satisfied",
    "explanation": "Contains soy sauce and cornstarch; traditional soy sauce contains gluten unless specified gluten-free. Cornstarch is gluten-free, but without confirmation, gluten presence i

## 5. Testing with Another Recipe

Let's test our refined prompt with the third recipt

In [14]:
# Test with the chocolate chip cookies recipe

test_recipe_3 = sample_recipes[2]  # Chocolate Chip Cookies
formatted_prompt = format_prompt(test_recipe_3, refined_prompt_2)
final_test_response = get_completion(client, user_prompt=formatted_prompt)

print(sample_recipes[2]["name"])
print(f"Refined prompt test with Chocolate Chip Cookies:\n{final_test_response}\n")

Chocolate Chip Cookies
Refined prompt test with Chocolate Chip Cookies:
{
  "vegetarian": {
    "classification": "satisfied",
    "explanation": "All ingredients are plant-based or derived from plants, with no meat, poultry, fish, or seafood. Butter is a dairy product, but vegetarian diets typically include dairy.",
    "critical_ingredients": []
  },
  "vegan": {
    "classification": "not satisfied",
    "explanation": "Contains butter and eggs, which are animal-derived products not permitted in vegan diets.",
    "critical_ingredients": ["butter", "eggs"]
  },
  "gluten-free": {
    "classification": "not satisfied",
    "explanation": "Contains all-purpose flour, which contains gluten.",
    "critical_ingredients": ["all-purpose flour"]
  },
  "dairy-free": {
    "classification": "not satisfied",
    "explanation": "Contains butter, a dairy product.",
    "critical_ingredients": ["butter"]
  },
  "nut-free": {
    "classification": "undeterminable",
    "explanation": "Contains c

## 6. Comparison

Let's compare the outputs from our initial and final prompts to evaluate the improvements informally

In [9]:
test_recipe = sample_recipes[0]  # Spaghetti Bolognese
formatted_prompt = format_prompt(test_recipe, refined_prompt_1)
refined_response_1 = get_completion(client, user_prompt=formatted_prompt)

print(f"Iteration 1 response for Spaghetti Bolognese:\n{refined_response_1}\n")

Iteration 1 response for Spaghetti Bolognese:
{
  "vegetarian": {
    "classification": "not satisfied",
    "explanation": "Contains ground beef, which is meat, violating vegetarian criteria.",
    "critical_ingredients": ["ground beef"]
  },
  "vegan": {
    "classification": "not satisfied",
    "explanation": "Contains ground beef, Parmesan cheese, and possibly other animal-derived ingredients, violating vegan criteria.",
    "critical_ingredients": ["ground beef", "Parmesan cheese"]
  },
  "gluten-free": {
    "classification": "not satisfied",
    "explanation": "Contains spaghetti, which is typically made from wheat and contains gluten.",
    "critical_ingredients": ["spaghetti"]
  },
  "dairy-free": {
    "classification": "not satisfied",
    "explanation": "Includes grated Parmesan cheese, which is dairy.",
    "critical_ingredients": ["Parmesan cheese"]
  },
  "nut-free": {
    "classification": "undeterminable",
    "explanation": "No nuts or nuts-derived ingredients are li

### Prompt Comparison

| Component | Initial Prompt | Final Prompt |
|-----------|---------------|--------------|
| Role | None specified | dietary consultant specializing in food allergies and dietary restrictions |
| Context | Minimal | Detailed definition of dietary restrictions |
| Task | Basic Analyze and classify | Analyze, classify with reasoning, identify critical ingredients  |
| Output Format | Simple JSON format | Structured JSON of classification, explanation and critical ingredients |
| Examples | None | A clear example to show ingredients and resonse |

### Response Comparison

| Aspect | Initial Response | Final Response |
|--------|-----------------|----------------|
| Format | Simple key-value pairs | Structured sub-fields of classification, explanation and critical ingredients |
| Accuracy | Not measured | Not measured |
| Transparency | No explanation of reasoning | clear explanation about the critical ingredients |
| Handling Ambiguity | Inconsistent | clear identification of underminable cases with reasonings |

### Response Comparison for Spaghetti Bolognese by Initial/Refined Prompt

| Response by initial prompt| Response by refined prompt|
|-----------------|----------------|
|  <code>{<br>  "vegetarian": "not satisfied",<br>  "nut-free": "satisfied",<br>  "egg-free": "satisfied",<br> .....<br>}<code> | <code>{<br>  "vegetarian": {<br>  "classification": "not satisfied",<br>  "explanation": "Contains ground beef, which is meat, violating vegetarian criteria.",<br>  "critical_ingredients": ["ground beef"],<br>},<br>  "nut-free": {<br>  "classification": "undeterminable",<br>  "explanation": "No nuts or nuts-derived ingredients are listed, but cross-contamination or hidden ingredients are unknown.",<br>  "critical_ingredients": [],<br>},<br>  "egg-free": {<br>  "classification": "undeterminable",<br>  "explanation": "No eggs are explicitly listed, but some pasta varieties contain eggs; without specific pasta details, cannot confirm.",<br>  "critical_ingredients": ["spaghetti"],<br>},<br>  .....,<br>}<code> |
