# 3-Step Query Generation Notebook

Smart approach to generate realistic food delivery queries:
1. **Step 1:** Generate pure user intents (no food bias)
2. **Step 2:** Match foods to best-fitting intents (1:1 mapping)
3. **Step 3:** Generate final queries that bridge intent ‚Üí food

In [2]:
response = openai.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "cheese pizza varieties"}],
    temperature=1.0,
    max_completion_tokens=128000,
)

In [3]:
response

ChatCompletion(id='chatcmpl-CIeHPSXWZf8dUhyUqOVpqvCQ09eMv', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Here are popular cheese pizza varieties, grouped by style and cheese profile:\n\nClassic tomato-based (red) pies\n- Margherita (Neapolitan): Fresh mozzarella (fior di latte or buffalo), tomato, basil.\n- New York‚Äìstyle cheese: Light tomato sauce, low-moisture mozzarella, big foldable slices.\n- Sicilian/Grandma: Pan-baked; Sicilian is thicker and airy, Grandma is thinner and crisper. Mozzarella under or over the sauce.\n- Detroit‚Äìstyle: Rectangular, airy crumb, Wisconsin brick cheese to the edges for a caramelized frico crust; light sauce stripes.\n- Chicago deep-dish cheese: Mozzarella (often in slabs) under a chunky tomato sauce; sometimes with provolone blend.\n- St. Louis‚Äìstyle: Cracker-thin with Provel (processed cheddar‚Äìprovolone‚ÄìSwiss blend), cut into squares.\n- Roman tonda rossa: Very thin, crisp; fresh or lo

In [6]:
import json
import os
from datetime import datetime

import openai
import pandas as pd
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
print(f"OpenAI API Key: {openai.api_key}")


def call_openai(prompt: str, model: str = "gpt-5") -> str:
    """Simple OpenAI API call."""
    try:
        response = openai.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=1.0,
            max_completion_tokens=128000,
        )
        return response.choices[0].message.content
    except Exception as e:
        print(f"OpenAI API error: {e}")
        return None

OpenAI API Key: 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


## Step 1: Generate Pure User Intents

In [7]:
# Load the intent generation prompt
with open("../prompts/query_generation/v1.1_intent_generation.txt") as f:
    intent_prompt = f.read()

print("üöÄ Step 1: Generating pure user intents...")
intent_response = call_openai(intent_prompt)

# Parse intents
intents = [line.strip() for line in intent_response.split("\n") if line.strip()]
print(f"Generated {len(intents)} intents")

# Display first 10 intents
print("\nFirst 10 intents:")
for i, intent in enumerate(intents[:10], 1):
    print(f"{i:2d}. {intent}")

print(f"\n... and {len(intents) - 10} more")

üöÄ Step 1: Generating pure user intents...
Generated 50 intents

First 10 intents:
 1. something filling under $12
 2. cheap dinner for one
 3. healthy lunch under 30 min
 4. light dinner options
 5. high protein meal quick n easy
 6. comfort food please
 7. hangover food delivery
 8. somethin greasy and fast
 9. low-carb dinner tonight
10. big portions cheap eats pls

... and 40 more


In [8]:
intent_response

'something filling under $12\ncheap dinner for one\nhealthy lunch under 30 min\nlight dinner options\nhigh protein meal quick n easy\ncomfort food please\nhangover food delivery\nsomethin greasy and fast\nlow-carb dinner tonight\nbig portions cheap eats pls\ntreat myself dinner tonight\ndairy-free lunch options\nbalanced meal with veggies\nhearty meal after gym\ncozy warm soup vibes\nmovie night snacks delivery\noffice lunch for team\ndate night dinner for two\nlate night food near me\nsunday brunch delivery now\nfood for 8 people\nkids meal options for picky eaters\ngame day finger foods\nstudy group snacks\nbirthday dinner at home\nrainy day comfort food\nquick breakfast to my door\nfast delivery under 25 minutes\nfree delivery near me\ngood eats open now near me\nmexican food near me\nnear campus cheap eats\ndelivers to hotel\ncontactless delivery meals\nfamily bundles under $30\nvalue meals under $15\nbulk order discounts\nno delivery fee deals\nbest rated spots near me\nsomething 

## Test Food Dataset

In [9]:
# Create test food dataset
test_foods = [
    {
        "id": 21496,
        "consumable_name": "Fried Potato Slice",
        "consumable_ingredients": "potato, oil, salt",
    },
    {
        "id": 6473,
        "consumable_name": "Cheeseburger Meal",
        "consumable_ingredients": "beef, cheese, bun, lettuce",
    },
    {
        "id": 2584,
        "consumable_name": "Grilled Fish with Noodles",
        "consumable_ingredients": "fish, noodles, vegetables",
    },
    {
        "id": 5706,
        "consumable_name": "Fresh Greens",
        "consumable_ingredients": "lettuce, spinach, herbs",
    },
    {
        "id": 228,
        "consumable_name": "Hot Pot",
        "consumable_ingredients": "broth, meat, vegetables, spices",
    },
    {
        "id": 12128,
        "consumable_name": "Fig and Burrata Salad",
        "consumable_ingredients": "figs, burrata, greens, pistachios",
    },
    {
        "id": 9215,
        "consumable_name": "Korean Fried Chicken",
        "consumable_ingredients": "chicken, sauce, pickled radish",
    },
    {
        "id": 1246,
        "consumable_name": "Fruit Cake",
        "consumable_ingredients": "flour, fruit, nuts, spices",
    },
]

food_df = pd.DataFrame(test_foods)
print(f"Test dataset: {len(food_df)} food items")
display(food_df)

Test dataset: 8 food items


Unnamed: 0,id,consumable_name,consumable_ingredients
0,21496,Fried Potato Slice,"potato, oil, salt"
1,6473,Cheeseburger Meal,"beef, cheese, bun, lettuce"
2,2584,Grilled Fish with Noodles,"fish, noodles, vegetables"
3,5706,Fresh Greens,"lettuce, spinach, herbs"
4,228,Hot Pot,"broth, meat, vegetables, spices"
5,12128,Fig and Burrata Salad,"figs, burrata, greens, pistachios"
6,9215,Korean Fried Chicken,"chicken, sauce, pickled radish"
7,1246,Fruit Cake,"flour, fruit, nuts, spices"


## Step 2: Smart Intent-to-Food Matching

In [10]:
# Create matching prompt
matching_prompt = f"""
You are a food delivery expert. Your task is to match food items to the most appropriate user search intents.

**USER INTENTS AVAILABLE:**
{chr(10).join([f"{i + 1}. {intent}" for i, intent in enumerate(intents)])}

**FOOD ITEMS TO MATCH:**
{food_df.to_markdown(index=False)}

**MATCHING RULES:**
1. Each food item should be matched to exactly ONE intent that makes the most logical sense
2. Each intent can only be used once (or left unused if no good match)
3. If a food doesn't match any intent well, skip it
4. Prioritize realistic, natural connections over forced matches
5. Consider the food's actual properties (protein content, price range, preparation style)

**OUTPUT FORMAT:**
Return a JSON object with the matches:
```json
{{
  "matches": [
    {{"food_id": 21496, "food_name": "Fried Potato Slice", "intent": "cheap eats near me", "reasoning": "Potato slices are typically affordable and crispy"}},
    {{"food_id": 6473, "food_name": "Cheeseburger Meal", "intent": "quick and greasy dinner", "reasoning": "Cheeseburgers are classic quick, greasy comfort food"}}
  ]
}}
```

Match each food to the best intent, providing reasoning for each match.
"""

print("üéØ Step 2: Matching foods to best intents...")
matching_response = call_openai(matching_prompt)

# Parse the matching response
try:
    json_start = matching_response.find("{")
    json_end = matching_response.rfind("}") + 1
    json_str = matching_response[json_start:json_end]
    matches = json.loads(json_str)

    print(f"Successfully matched {len(matches['matches'])} foods to intents:\n")
    for match in matches["matches"]:
        print(f"üçΩÔ∏è  {match['food_name']}")
        print(f"üîç Intent: {match['intent']}")
        print(f"üí≠ Reasoning: {match['reasoning']}")
        print()

except Exception as e:
    print(f"Error parsing matches: {e}")
    print(f"Raw response: {matching_response}")

üéØ Step 2: Matching foods to best intents...
Successfully matched 8 foods to intents:

üçΩÔ∏è  Fried Potato Slice
üîç Intent: game day finger foods
üí≠ Reasoning: Crispy, shareable snack that‚Äôs perfect for munching during the game.

üçΩÔ∏è  Cheeseburger Meal
üîç Intent: somethin greasy and fast
üí≠ Reasoning: Classic quick-serve, greasy comfort pick with fries and a drink.

üçΩÔ∏è  Grilled Fish with Noodles
üîç Intent: hearty meal after gym
üí≠ Reasoning: Lean protein with carbs and veggies for solid post-workout recovery.

üçΩÔ∏è  Fresh Greens
üîç Intent: vegan dinner options
üí≠ Reasoning: All plant-based greens; a light vegan-friendly meal option.

üçΩÔ∏è  Hot Pot
üîç Intent: cozy warm soup vibes
üí≠ Reasoning: Steaming broth with meats and veggies hits the cozy soup craving.

üçΩÔ∏è  Fig and Burrata Salad
üîç Intent: treat myself dinner tonight
üí≠ Reasoning: Premium ingredients (burrata, figs, pistachios) feel indulgent and special.

üçΩÔ∏è  Korean Fried Chi

In [12]:
print(matching_response)

{
  "matches": [
    {
      "food_id": 21496,
      "food_name": "Fried Potato Slice",
      "intent": "game day finger foods",
      "reasoning": "Crispy, shareable snack that‚Äôs perfect for munching during the game."
    },
    {
      "food_id": 6473,
      "food_name": "Cheeseburger Meal",
      "intent": "somethin greasy and fast",
      "reasoning": "Classic quick-serve, greasy comfort pick with fries and a drink."
    },
    {
      "food_id": 2584,
      "food_name": "Grilled Fish with Noodles",
      "intent": "hearty meal after gym",
      "reasoning": "Lean protein with carbs and veggies for solid post-workout recovery."
    },
    {
      "food_id": 5706,
      "food_name": "Fresh Greens",
      "intent": "vegan dinner options",
      "reasoning": "All plant-based greens; a light vegan-friendly meal option."
    },
    {
      "food_id": 228,
      "food_name": "Hot Pot",
      "intent": "cozy warm soup vibes",
      "reasoning": "Steaming broth with meats and veggies hit

## Step 3: Generate Final Queries

In [13]:
# Generate final queries for ALL matched pairs in one API call
batch_query_prompt = """
You are a food delivery expert. For each intent-food pair below, generate 3 realistic search queries that would naturally bridge from the user's original intent to the specific food.

**INTENT-FOOD PAIRS:**
"""

for i, match in enumerate(matches["matches"], 1):
    batch_query_prompt += f"""
{i}. Intent: "{match["intent"]}"
   Food: {match["food_name"]} (ID: {match["food_id"]})
"""

batch_query_prompt += """

**REQUIREMENTS:**
- Generate 2 queries per food item
- Preserve the user's original intent and context
- Use authentic language that real users would type
- Make the progression feel natural, not forced
- Avoid being too specific unless it feels natural

**OUTPUT FORMAT:**
Return as JSON:
```json
{
  "query_results": [
    {
      "food_id": 21496,
      "food_name": "Fried Potato Slice",
      "original_intent": "game day finger foods",
      "queries": [
        "crispy finger foods for game day",
        "crunchy snacks for watching sports",
        "game day munchies delivery"
      ]
    }
  ]
}
```

Generate 2 queries for each food item that naturally bridge from intent to food.
"""

print("üöÄ Step 3: Generating ALL final queries in one API call...")
batch_response = call_openai(batch_query_prompt)

# Parse the batch response
try:
    json_start = batch_response.find("{")
    json_end = batch_response.rfind("}") + 1
    json_str = batch_response[json_start:json_end]
    query_results = json.loads(json_str)

    # Convert to our format
    final_queries = []
    for result in query_results["query_results"]:
        for query in result["queries"]:
            final_queries.append(
                {
                    "original_intent": result["original_intent"],
                    "food_id": result["food_id"],
                    "food_name": result["food_name"],
                    "final_query": query,
                    "generated_at": datetime.now().isoformat(),
                }
            )

    print(f"‚úÖ Generated {len(final_queries)} final queries in one call!")

except Exception as e:
    print(f"Error parsing batch response: {e}")
    print(f"Raw response: {batch_response}")
    # Fall back to empty list if parsing fails
    final_queries = []

üöÄ Step 3: Generating ALL final queries in one API call...
‚úÖ Generated 16 final queries in one call!


## Results Analysis

In [14]:
# Convert to DataFrame for analysis
results_df = pd.DataFrame(final_queries)

print("üéâ FINAL RESULTS:")
print("=" * 50)

for intent in results_df["original_intent"].unique():
    intent_results = results_df[results_df["original_intent"] == intent]
    print(f"\nüîç Original Intent: '{intent}'")
    food_name = intent_results.iloc[0]["food_name"]
    print(f"üçΩÔ∏è  Matched Food: {food_name}")
    print("üìù Generated Queries:")
    for _, row in intent_results.iterrows():
        print(f"   ‚Ä¢ {row['final_query']}")

print(
    f"\nüìä Summary: {len(results_df)} total queries for {len(results_df['food_name'].unique())} foods"
)

üéâ FINAL RESULTS:

üîç Original Intent: 'game day finger foods'
üçΩÔ∏è  Matched Food: Fried Potato Slice
üìù Generated Queries:
   ‚Ä¢ game day crispy potato slices
   ‚Ä¢ fried potato finger foods for the game

üîç Original Intent: 'somethin greasy and fast'
üçΩÔ∏è  Matched Food: Cheeseburger Meal
üìù Generated Queries:
   ‚Ä¢ greasy cheeseburger combo near me
   ‚Ä¢ fast burger meal with fries

üîç Original Intent: 'hearty meal after gym'
üçΩÔ∏è  Matched Food: Grilled Fish with Noodles
üìù Generated Queries:
   ‚Ä¢ post workout grilled fish noodle bowl
   ‚Ä¢ high protein noodles with grilled fish

üîç Original Intent: 'vegan dinner options'
üçΩÔ∏è  Matched Food: Fresh Greens
üìù Generated Queries:
   ‚Ä¢ vegan fresh greens salad for dinner
   ‚Ä¢ plant based greens bowl near me

üîç Original Intent: 'cozy warm soup vibes'
üçΩÔ∏è  Matched Food: Hot Pot
üìù Generated Queries:
   ‚Ä¢ cozy hot pot for a warm soup night
   ‚Ä¢ steaming hot pot delivery for cozy vibes

ü

In [15]:
results_df

Unnamed: 0,original_intent,food_id,food_name,final_query,generated_at
0,game day finger foods,21496,Fried Potato Slice,game day crispy potato slices,2025-09-22T21:14:42.035963
1,game day finger foods,21496,Fried Potato Slice,fried potato finger foods for the game,2025-09-22T21:14:42.036029
2,somethin greasy and fast,6473,Cheeseburger Meal,greasy cheeseburger combo near me,2025-09-22T21:14:42.036032
3,somethin greasy and fast,6473,Cheeseburger Meal,fast burger meal with fries,2025-09-22T21:14:42.036034
4,hearty meal after gym,2584,Grilled Fish with Noodles,post workout grilled fish noodle bowl,2025-09-22T21:14:42.036036
5,hearty meal after gym,2584,Grilled Fish with Noodles,high protein noodles with grilled fish,2025-09-22T21:14:42.036039
6,vegan dinner options,5706,Fresh Greens,vegan fresh greens salad for dinner,2025-09-22T21:14:42.036041
7,vegan dinner options,5706,Fresh Greens,plant based greens bowl near me,2025-09-22T21:14:42.036043
8,cozy warm soup vibes,228,Hot Pot,cozy hot pot for a warm soup night,2025-09-22T21:14:42.036045
9,cozy warm soup vibes,228,Hot Pot,steaming hot pot delivery for cozy vibes,2025-09-22T21:14:42.036047


## Save Results

In [None]:
# Save results
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_path = f"../output/3step_smart_results_{timestamp}.csv"

results_df.to_csv(output_path, index=False)
print(f"Results saved to: {output_path}")

# Also save the original intents for reference
intents_path = f"../output/3step_intents_{timestamp}.txt"
with open(intents_path, "w") as f:
    f.write("ORIGINAL USER INTENTS GENERATED (Step 1):\n")
    f.write("=" * 50 + "\n")
    for i, intent in enumerate(intents, 1):
        f.write(f"{i}. {intent}\n")

print(f"Original intents saved to: {intents_path}")

# Display final DataFrame
display(results_df)