# Create test Queries

The notebook creates test queries based on keywords and recipes. The results are natural queries like:

- "I have diabetes"
- "I'm vegan"
- "I'm gluten intolerant"
- "I have chicken, what can I make?"
- "I want to cook something healthy with rice"


In [1]:
import sys
import random
import pandas as pd
import ast
from pathlib import Path
from dotenv import load_dotenv

# Add project root to path
project_root = Path.cwd()
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# Load environment
load_dotenv()

# Import keywords
from src.keywords import KEYWORDS

# set random seed for reproducibility
random.seed(30)

print(f"Keywords: {KEYWORDS}")

# Load recipes
recipes_df = pd.read_csv('data/small_recipes.csv')
print(f"Loaded {len(recipes_df)} recipes")

Keywords: ['vegan', 'vegetarian', 'gluten-free', 'kosher', 'lactose-free', 'low-carb', 'inexpensive', 'high-protein', 'diabetic', '15-minutes-or-less', 'healthy']
Loaded 450 recipes


## Query Templates

Three types of templates:
- Standard: I want a {vegan } recipe
- Personal condition: "I have diabetes", "I'm vegan"
- With ingredient: "I have chicken, what can I make?", "I want to cook something vegan with rice"
- Condition and Ingredient combined: "I have diabetes and chicken, what can I make?"

Distribution:
- ~30% include specific ingredients
- ~35% use personal condition phrasing
- ~35% use standard templates


In [2]:
# Standard
standard_templates = [
    "I want a {keyword} recipe",
    "Show me {keyword} recipes",
    "I need {keyword} meal ideas",
    "Looking for {keyword} dishes",
    "Can you suggest {keyword} food?",
    "I want a quick {keyword} recipe",
    "Show me easy {keyword} dishes",
    "What's a good {keyword} meal?",
]
# with ingredients
ingredient_templates = [
    "I have {ingredient}, what can I make that's {keyword}?",
    "I want to cook something {keyword} with {ingredient}",
    "Show me {keyword} recipes with {ingredient}",
    "I have {ingredient}, need {keyword} recipe ideas",
    "What {keyword} dishes can I make with {ingredient}?",
    "I want a {keyword} recipe using {ingredient}",
]

# Personal condition 
condition_templates = {
    "vegan": [
        "I'm vegan, what can I cook?",
        "I'm vegan, show me recipes",
        "I don't eat animal products, what should I make?",
        "I'm vegan and need dinner ideas",
        "I follow a vegan diet, what can I cook?",
        "I'm plant-based, show me recipes",
    ],
    "vegetarian": [
        "I'm vegetarian, what can I cook?",
        "I'm vegetarian, show me recipes",
        "I don't eat meat, what should I make?",
        "I'm vegetarian and need meal ideas",
        "I follow a vegetarian diet, help me",
    ],
    "gluten-free": [
        "I'm gluten intolerant, what can I eat?",
        "I have celiac disease, show me recipes",
        "I can't eat gluten, what should I cook?",
        "I'm gluten-free, need recipe ideas",
        "I need gluten-free meal suggestions",
        "I can't have wheat, what can I make?",
    ],
    "lactose-free": [
        "I'm lactose intolerant, what can I cook?",
        "I can't have dairy, show me recipes",
        "I'm lactose intolerant, need meal ideas",
        "I can't eat dairy products, help me",
        "I need lactose-free recipes",
        "I have lactose intolerance, what should I make?",
    ],
    "diabetic": [
        "I have diabetes, what can I cook?",
        "I'm diabetic, show me recipes",
        "I have diabetes, need meal ideas",
        "I'm diabetic and need healthy recipes",
        "I have type 2 diabetes, what should I eat?",
        "I need diabetic-friendly recipes",
    ],
    "low-carb": [
        "I'm on a low-carb diet, what can I make?",
        "I'm doing keto, show me recipes",
        "I need low-carb meal ideas",
        "I'm cutting carbs, what should I cook?",
        "I'm on a low-carb diet, help me",
    ],
    "high-protein": [
        "I need high-protein meals",
        "I'm building muscle, show me high-protein recipes",
        "I need more protein in my diet",
        "I'm looking for protein-rich meals",
        "I need high-protein meal ideas",
    ],
    "healthy": [
        "I want to eat healthier, what should I cook?",
        "I'm trying to eat healthy, show me recipes",
        "I need healthy meal ideas",
        "I want nutritious recipes",
        "I'm on a health kick, what can I make?",
    ],
    "15-minutes-or-less": [
        "I'm in a hurry, what's quick to make?",
        "I need something fast, show me quick recipes",
        "I only have 15 minutes, what can I cook?",
        "I need quick meal ideas",
        "I'm short on time, help me",
    ],
    "inexpensive": [
        "I'm on a budget, what can I cook?",
        "I need cheap meal ideas",
        "I'm broke, show me affordable recipes",
        "I need budget-friendly recipes",
        "I want to save money, what should I make?",
    ],
}

# Condition with ingredients
condition_ingredient_templates = {
    "vegan": [
        "I'm vegan and I have {ingredient}, what can I make?",
        "I'm vegan, show me recipes with {ingredient}",
        "I have {ingredient}, what vegan dishes can I cook?",
    ],
    "vegetarian": [
        "I'm vegetarian and I have {ingredient}, what can I make?",
        "I'm vegetarian, show me recipes with {ingredient}",
        "I have {ingredient}, what vegetarian dishes can I cook?",
    ],
    "gluten-free": [
        "I'm gluten intolerant and I have {ingredient}, what can I make?",
        "I can't eat gluten, show me recipes with {ingredient}",
        "I have {ingredient}, what gluten-free dishes can I cook?",
    ],
    "lactose-free": [
        "I'm lactose intolerant and I have {ingredient}, what can I make?",
        "I can't have dairy, show me recipes with {ingredient}",
        "I have {ingredient}, what lactose-free dishes can I cook?",
    ],
    "diabetic": [
        "I have diabetes and {ingredient}, what can I make?",
        "I'm diabetic, show me recipes with {ingredient}",
        "I have {ingredient}, what diabetic-friendly dishes can I cook?",
    ],
    "healthy": [
        "I want to eat healthy and I have {ingredient}, what can I make?",
        "I'm trying to be healthy, show me recipes with {ingredient}",
        "I have {ingredient}, what healthy dishes can I cook?",
    ],
}

## Generate Test Cases

In [3]:
test_cases = []
N_PER_KEYWORD = 8
N_TEST_CASES = 100

# For each keyword N_PER_KEYWORD test cases
for keyword in KEYWORDS:
    # Find recipes matching this keyword
    matching_recipes = []
    for idx, row in recipes_df.iterrows():
        tags = ast.literal_eval(row['tags'])
        tags_str = ' '.join(tags).lower()
        if keyword.lower() in tags_str:
            matching_recipes.append(row)
    
    print(f"{keyword}: found {len(matching_recipes)} recipes")
    
    # Sample N_PER_KEYWORD recipes for this keyword
    if len(matching_recipes) > N_PER_KEYWORD:
        sampled = random.sample(matching_recipes, N_PER_KEYWORD)
    else:
        sampled = matching_recipes
    
    # Create  query for each sampled recipe
    for recipe in sampled:
        # Get ingredients from the recipe
        ingredients = ast.literal_eval(recipe['ingredients'])
        
        # Decide which type of query to generate
        rand = random.random()
        
        # 30% chance to include an ingredient
        if rand < 0.3 and len(ingredients) > 0:
            ingredient = random.choice(ingredients)
            
            # 50% chance to use condition+ingredient template if available
            if keyword in condition_ingredient_templates and random.random() < 0.5:
                template = random.choice(condition_ingredient_templates[keyword])
                query = template.format(ingredient=ingredient)
            else:
                # Use standard ingredient template
                template = random.choice(ingredient_templates)
                query = template.format(keyword=keyword, ingredient=ingredient)
        
        # 35% chance to use personal condition template
        elif rand < 0.65 and keyword in condition_templates:
            query = random.choice(condition_templates[keyword])
        
        # 35% chance to use standard template
        else:
            template = random.choice(standard_templates)
            query = template.format(keyword=keyword)
        
        test_cases.append({
            "query": query,
            "keyword": keyword,
            "recipe_name": recipe['name'],
            "recipe_id": int(recipe['id'])
        })

# Randomize order
random.shuffle(test_cases)

# ONLY N_TEST_CASES
test_cases = test_cases[:N_TEST_CASES]

print(f"\nGenerated {len(test_cases)} test cases")

vegan: found 65 recipes
vegetarian: found 148 recipes
gluten-free: found 56 recipes
kosher: found 47 recipes
lactose-free: found 0 recipes
low-carb: found 121 recipes
inexpensive: found 120 recipes
high-protein: found 55 recipes
diabetic: found 45 recipes
15-minutes-or-less: found 45 recipes
healthy: found 66 recipes

Generated 80 test cases


In [4]:
# Print first 10 test cases
for i in range(11):
    print(f"{i}. Query: {test_cases[i-1]['query']}")

0. Query: I'm gluten-free, need recipe ideas
1. Query: I want a kosher recipe
2. Query: I have orange peel, what can I make that's low-carb?
3. Query: I want a quick kosher recipe
4. Query: I need cheap meal ideas
5. Query: I'm trying to be healthy, show me recipes with chicken broth
6. Query: I'm vegetarian and need meal ideas
7. Query: I'm diabetic and need healthy recipes
8. Query: I need cheap meal ideas
9. Query: I need kosher meal ideas
10. Query: I need low-carb meal ideas


## Generate Test cases

In [5]:
from src.inference import InferenceEngine

print("Initializing engine...")
engine = InferenceEngine.from_config()

  from .autonotebook import tqdm as notebook_tqdm
2026-01-21 18:40:28 - INFO - chromadb.telemetry.product.posthog - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
2026-01-21 18:40:28 - INFO - Inference - Document store found with 450 documents.


Initializing engine...


2026-01-21 18:40:28 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/BAAI/bge-base-en-v1.5 "HTTP/1.1 200 OK"
2026-01-21 18:40:29 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/Qwen/Qwen2.5-7B-Instruct "HTTP/1.1 200 OK"


In [6]:
# Test the first query
test = test_cases[0]

message_history = [{"role": "user", "content": test['query']}]
response = ""
for token in engine.stream_response(message_history):
    print(token, end="", flush=True)
    response += token

2026-01-21 18:40:29 - INFO - Inference - Search Query: I want a kosher recipe
2026-01-21 18:40:29 - INFO - haystack.core.pipeline.pipeline - Running component query_embedder
2026-01-21 18:40:29 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/BAAI/bge-base-en-v1.5?expand=inferenceProviderMapping "HTTP/1.1 200 OK"
2026-01-21 18:40:29 - INFO - httpx - HTTP Request: GET https://huggingface.co/api/models/BAAI/bge-base-en-v1.5 "HTTP/1.1 200 OK"
2026-01-21 18:40:29 - INFO - httpx - HTTP Request: POST https://router.huggingface.co/hf-inference/models/BAAI/bge-base-en-v1.5/pipeline/feature-extraction "HTTP/1.1 200 OK"
2026-01-21 18:40:29 - INFO - haystack.core.pipeline.pipeline - Running component retriever
2026-01-21 18:40:30 - INFO - httpx - HTTP Request: POST https://router.huggingface.co/v1/chat/completions "HTTP/1.1 200 OK"
2026-01-21 18:40:30 - INFO - Inference - Extracted keywords: ['kosher']
2026-01-21 18:40:30 - INFO - Inference - Filtered down to 4 documents based

Based on the provided context, here is a kosher recipe:

**Name: Lentil Soup in 10 Minutes Pressure Cooker**

**Ingredients:**
- Olive oil
- Onion
- Carrots
- Bay leaves
- Fresh thyme
- Winter savory
- Water
- Green lentils
- Red lentils
- Potato
- Kosher salt
- Fresh ground black pepper
- Parmesan cheese

**Instructions:**
1. Heat the oil in the pressure cooker over medium heat.
2. Add the onion and sauté until soft, about 2 minutes.
3. Add the carrots and sauté for another minute.
4. Add the bay leaves, thyme, and winter savory as desired.
5. Add water or broth, green lentils, red lentils, and potato. Stir well.
6. Lock the lid in place on the cooker and bring the mixture to a boil over high heat until you achieve high pressure.
7. Reduce the heat to maintain pressure for 6 minutes.
8. Remove from the heat and let the pressure come down naturally. Remove the lid, tilting it away from you.
9. Remove the bay leaves and thyme stems.
10. Add salt and pepper to taste.
11. Top with grated 

In [7]:
# Create test cases for all queries
# Store results
results = []

for i, test in enumerate(test_cases, 1):
    
    message_history = [{"role": "user", "content": test['query']}]
    
    response = ""
    try:
        for token in engine.stream_response(message_history):
            response += token
    except Exception as e:
        response = f"ERROR: {str(e)}"
        print(f"  Error occurred: {e}")
    
    results.append({
        "query": test['query'],
        "keyword": test['keyword'],
        "expected_recipe": test['recipe_name'],
        "response": response
    })

2026-01-21 18:40:32 - INFO - Inference - Search Query: I want a kosher recipe
2026-01-21 18:40:32 - INFO - haystack.core.pipeline.pipeline - Running component query_embedder
2026-01-21 18:40:39 - INFO - httpx - HTTP Request: POST https://router.huggingface.co/hf-inference/models/BAAI/bge-base-en-v1.5/pipeline/feature-extraction "HTTP/1.1 200 OK"
2026-01-21 18:40:39 - INFO - haystack.core.pipeline.pipeline - Running component retriever
2026-01-21 18:40:39 - INFO - httpx - HTTP Request: POST https://router.huggingface.co/v1/chat/completions "HTTP/1.1 200 OK"
2026-01-21 18:40:39 - INFO - Inference - Extracted keywords: ['kosher']
2026-01-21 18:40:39 - INFO - Inference - Filtered down to 4 documents based on keywords.
2026-01-21 18:40:39 - INFO - Inference - Context: ['meat or mushroom kubbeh', 'lentil soup in 10 minutes  pressure cooker', 'ruth s homemade vanilla extract', 'roasted beets and carrots from cooks illustrated']
2026-01-21 18:40:40 - INFO - httpx - HTTP Request: POST https://r

In [8]:
results_df = pd.DataFrame(results)
results_df

Unnamed: 0,query,keyword,expected_recipe,response
0,I want a kosher recipe,kosher,low carb chicken salad,"Based on the provided context, here is a koshe..."
1,"I have orange peel, what can I make that's low...",low-carb,pork chops in orange sauce,Based on the ingredients you have (orange peel...
2,I want a quick kosher recipe,kosher,vegetarian cassoulet,"Based on the provided context, here is a quick..."
3,I need cheap meal ideas,inexpensive,hawaiian pizzas,"Based on the provided context, here are a coup..."
4,"I'm trying to be healthy, show me recipes with...",healthy,creamy souper rice,"Based on the provided context, here is a recip..."
...,...,...,...,...
75,I want a inexpensive recipe using fresh ground...,inexpensive,tuna fish salad,"Based on the provided context, here is a recip..."
76,What's a good 15-minutes-or-less meal?,15-minutes-or-less,rich butterscotch sauce,"Based on the provided context, here's a quick ..."
77,Can you suggest gluten-free food?,gluten-free,pamela s gluten free brownies,Certainly! Here's a gluten-free recipe for **G...
78,I need quick meal ideas,15-minutes-or-less,metabolism booster hot n spicy v 8,"Based on the provided CONTEXT, here are two qu..."


In [None]:
# Save results to CSV
results_df.to_csv('results/test_query_results.csv', index=False)

In [10]:
# Read and display saved results
saved_results = pd.read_csv('results/test_query_results.csv')
saved_results

Unnamed: 0,query,keyword,expected_recipe,response
0,I want a kosher recipe,kosher,low carb chicken salad,"Based on the provided context, here is a koshe..."
1,"I have orange peel, what can I make that's low...",low-carb,pork chops in orange sauce,Based on the ingredients you have (orange peel...
2,I want a quick kosher recipe,kosher,vegetarian cassoulet,"Based on the provided context, here is a quick..."
3,I need cheap meal ideas,inexpensive,hawaiian pizzas,"Based on the provided context, here are a coup..."
4,"I'm trying to be healthy, show me recipes with...",healthy,creamy souper rice,"Based on the provided context, here is a recip..."
...,...,...,...,...
75,I want a inexpensive recipe using fresh ground...,inexpensive,tuna fish salad,"Based on the provided context, here is a recip..."
76,What's a good 15-minutes-or-less meal?,15-minutes-or-less,rich butterscotch sauce,"Based on the provided context, here's a quick ..."
77,Can you suggest gluten-free food?,gluten-free,pamela s gluten free brownies,Certainly! Here's a gluten-free recipe for **G...
78,I need quick meal ideas,15-minutes-or-less,metabolism booster hot n spicy v 8,"Based on the provided CONTEXT, here are two qu..."
