# Part 1: Define Dimensions & Generate Initial Queries


In this notebook, we will cover:

- Identifying key dimensions of our user queries
- Generate realistic unique combinations of dimension values as tuples
- Use the dimension tuples to generate synthetic data
- Add our synthetic queries to those provided by our domain experts

Let's start by importing the required libraries.


In [None]:
import json
import os
import sys

sys.path.append(os.path.abspath(".."))
from concurrent.futures import ThreadPoolExecutor
from datetime import datetime
from functools import partial
from textwrap import dedent

import braintrust as bt
import openai as oai
import requests

from dotenv import load_dotenv
from pydantic import BaseModel, Field

load_dotenv(override=True)

True

## Setup

Once you're signed up on [Braintrust](https://www.braintrust.dev/) and have set the appropriate API keys in your `.env` file, you're ready to go. For this particular demo, you'll need the following API keys:

- `BRAINTRUST_API_KEY`
- `OPENAI_API_KEY`


In [2]:
BT_PROJECT_NAME = "recipe-bot"
MAX_WORKERS = 5

bt_project = bt.projects.create(name=BT_PROJECT_NAME)
oai_client = oai.OpenAI(base_url="https://api.braintrust.dev/v1/proxy", api_key=os.getenv("OPENAI_API_KEY"))


**Key takeaways**:

1. We use `braintrust.projects.create()` to create or fetch a Braintrust project.
2. We use our [AI proxy](https://www.braintrust.dev/docs/guides/proxy) to simplify our LLM calls and take advantage of automatic prompt caching.


## Step 1: Identify Key Dimensions:

> "First, before prompting anything, we define key dimensions of the query space... [to] help us systematically vary different aspects of a user’s request"


### Prompt


In [4]:
TUPLES_GEN_PROMPT = """\
I am designing a customer support chatbot for the city of Encinitas and I want to test it against a diverse
range of inquiries citizens might submit. I have provided you with several dimensions and that constitute the parts of
such a queries along with a list of possible values for each dimension.

## Instructions

Generate {{{num_tuples_to_generate}}} unique combinations of dimension values for based on the dimensions provided below. 
- Each combination should represent a different user scenario. 
- Ensure balanced coverage across all dimensions - don't over-represent any particular value or combination.
- Vary the query styles naturally.
- Attempt to make the dimension value combinations as realistic as possible.

## Dimensions

request_type:
- recipe_request: [asking for a specific recipe or recipe suggestions]
- cooking_technique: [asking how to cook something or cooking methods]
- ingredient_substitution: [need alternatives for missing ingredients]
- nutritional_guidance: [focused on protein, calories, healthy options]
- meal_planning: [meal prep, batch cooking, planning ahead]

dietary_goal:
- bulking: [building muscle, high calories, high protein]
- cutting: [weight loss, low calorie, filling foods]
- general_health: [balanced, nutritious, everyday cooking]
- convenience: [quick, easy, practical cooking]
- no_specific_goal: [just looking for tasty food]

ingredient_specificity:
- specific_ingredients: [specific ingredients like chicken, beef, fish, etc.]
- general_ingredients: [general ingredients like chicken, beef, fish, etc.]
- no_specific_ingredients: [no specific ingredients, just a general request]

time_commitment:
- quick_20_min
- moderate_1_hour
- meal_prep_batch
- no_time_constraint

specificity_level:
- very_specific: [detailed requirements: exact ingredients, stores to get ingredients from, protein amounts]
- moderately_specific: [some constraints: ingredient type, cooking method, general nutrition]
- general_request: [broad categories: "high protein lunch", "healthy dinner"]
- minimal_detail: [very brief: "chicken recipe", "meal prep ideas"]

store_requirement:
- specific_stores: [Costco, Trader Joes, Vons]
- no_specific_stores

Generate {{{num_tuples_to_generate}}} unique dimension tuples following these patterns. Remember to maintain balanced diversity across all dimensions."""


In [5]:
# print(TUPLES_GEN_PROMPT.replace("{{{num_tuples_to_generate}}}", "10"))

Create a [versioned prompt](https://www.braintrust.dev/docs/guides/functions/prompts) and save it to Braintrust.


In [6]:
tuples_gen_prompt = bt.load_prompt(project=BT_PROJECT_NAME, slug="dimension-tuples-gen-prompt")
try:
    tuples_gen_prompt.build(num_tuples_to_generate=20)
except Exception as e:
    bt_tuples_gen_prompt = bt_project.prompts.create(
        name="DimensionTuplesGenPrompt",
        slug="dimension-tuples-gen-prompt",
        description="Prompt for generating dimension tuples",
        model="claude-4-sonnet-20250514",
        messages=[{"role": "user", "content": TUPLES_GEN_PROMPT}],
        if_exists="replace",
    )

    bt_project.publish()
    tuples_gen_prompt = bt.load_prompt(project=BT_PROJECT_NAME, slug="dimension-tuples-gen-prompt")


In [7]:
_p = tuples_gen_prompt.build(num_tuples_to_generate=20)
print(_p["messages"][0]["content"])

I am designing a customer support chatbot for the city of Encinitas and I want to test it against a diverse
range of inquiries citizens might submit. I have provided you with several dimensions and that constitute the parts of
such a queries along with a list of possible values for each dimension.

## Instructions

Generate 20 unique combinations of dimension values for based on the dimensions provided below. 
- Each combination should represent a different user scenario. 
- Ensure balanced coverage across all dimensions - don't over-represent any particular value or combination.
- Vary the query styles naturally.
- Attempt to make the dimension value combinations as realistic as possible.

## Dimensions

request_type:
- recipe_request: [asking for a specific recipe or recipe suggestions]
- cooking_technique: [asking how to cook something or cooking methods]
- ingredient_substitution: [need alternatives for missing ingredients]
- nutritional_guidance: [focused on protein, calories, hea

**Key takeaways**:

1. We can create and manage versions of prompts using the SDK or through the UI.
2. We can retrieve any prompt version via the SDK for use in our code (the latest version is returned by default).


## Step 2: Generate Unique Combinations (Tuples)


In [3]:
class DimensionTuple(BaseModel):
    request_type: str = Field(
        description="What kind of request they are making (e.g. recipe_request, cooking_technique, ingredient_substitution, nutritional_guidance, meal_planning)"
    )
    dietary_goal: str = Field(
        description="Their dietary objective (e.g. bulking, cutting, general_health, convenience, no_specific_goal)",
    )
    ingredient_specificity: str = Field(
        description="How specific their ingredient preferences are (e.g. specific_ingredients, general_ingredients, no_specific_ingredients)"
    )
    time_commitment: str = Field(
        description="How much time they want to spend cooking (e.g. quick_20_min, moderate_1_hour, meal_prep_batch, no_time_constraint)"
    )
    specificity_level: str = Field(
        description="How specific their request is (e.g. very_specific, moderately_specific, general_request, minimal_detail)"
    )
    store_requirement: str = Field(
        description="Whether they have specific store preferences (e.g. specific_stores like Costco, Trader Joe's, Vons, etc., no_specific_stores)"
    )


class DimensionTuples(BaseModel):
    tuples: list[DimensionTuple]

In [8]:
def generate_synth_data_dimension_tuples(num_tuples: int = 20, model: str = "gpt-4o-mini", model_kwargs: dict = {}):
    """Generate a list of dimension tuples based on the provided prompt."""
    timestamp = datetime.now().strftime("%Y%m%d_%H%M")
    prompt = tuples_gen_prompt.build(num_tuples_to_generate=num_tuples)

    rsp = oai_client.beta.chat.completions.parse(
        model=model,
        messages=prompt["messages"],
        response_format=DimensionTuples,
        **model_kwargs,
    )

    tuples_list: DimensionTuples = rsp.choices[0].message.parsed  # type: ignore

    unique_tuples = []
    seen = set()

    for tup in tuples_list.tuples:
        tuple_str = tup.model_dump_json()
        if tuple_str in seen:
            continue

        seen.add(tuple_str)
        unique_tuples.append(tup)

    bt_experiment = bt.init(project=BT_PROJECT_NAME, experiment=f"synth_tuples_it_{timestamp}")
    for uniq_tup in unique_tuples:
        with bt_experiment.start_span(name="generate_dimension_tuples") as span:
            span.log(input=prompt["messages"], output=uniq_tup, metadata=dict(model=model, model_kwargs=model_kwargs))

    summary = bt_experiment.summarize(summarize_scores=False)
    return summary, rsp.choices[0].message.parsed

**Key Takeaways**:

1. We use `braintrust.init` to manually create a new experiment.
2. We generate a trace in the form of a single span, adding information for `input`, `output`, and `metadata`.
3. We obtain the experiment summary via `braintrust.summarize` to review the experiment results.


In [9]:
exp_summary, dim_tuples = generate_synth_data_dimension_tuples(20)

print(exp_summary)
print(dim_tuples)


See results for synth_tuples_it_20250728_1954 at https://www.braintrust.dev/app/aie-course-2025/p/recipe-bot/experiments/synth_tuples_it_20250728_1954
tuples=[DimensionTuple(request_type='recipe_request', dietary_goal='bulking', ingredient_specificity='specific_ingredients', time_commitment='moderate_1_hour', specificity_level='very_specific', store_requirement='specific_stores'), DimensionTuple(request_type='cooking_technique', dietary_goal='cutting', ingredient_specificity='general_ingredients', time_commitment='quick_20_min', specificity_level='general_request', store_requirement='no_specific_stores'), DimensionTuple(request_type='ingredient_substitution', dietary_goal='general_health', ingredient_specificity='no_specific_ingredients', time_commitment='no_time_constraint', specificity_level='minimal_detail', store_requirement='specific_stores'), DimensionTuple(request_type='nutritional_guidance', dietary_goal='convenience', ingredient_specificity='general_ingredients', time_commitm

Retrying request after error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Sleeping for 0.5 seconds


### Remove invalid dimension tuples (human/SME review)

We'll do this in the Braintrust UI with [human review](https://www.braintrust.dev/docs/guides/human-review).


<img src="./data/dim-tuples-human-review.png" width="800"/>

**Key takeaways**:

1. To set up scorers for human review, navigate to "Configuration" > "Human Review".
2. Free-form scorers can be configured to record data in the metadata.
3. Human review scorers can be used in BTQL queries.


## Step 3: Generate Natural Language User Queries


### Prompt


In [11]:
SYNTH_DATA_GEN_PROMPT = """\
I am designing a recipe chatbot tailored to the specific tastes of my family and I want to test it against a diverse
range of realistic queries.

## Objective

Produce {{{num_queries_to_generate}}} natural language queries based on the query characteristics below:

==== Query Characteristics =====
{{{dimension_tuple_json}}}

## Instructions
1. Naturally incorporate all the dimension values
2. Vary in style and detail level
3. Be realistic and practical
4. If the including store, prefer these options: Costco, Trader Joe's, Vons, Stater Bros, Ralphs, etc.
5. Include natural variations in typing style, such as:
   - Some queries in all lowercase
   - Some with random capitalization
   - Some with common typos
   - Some with missing punctuation
   - Some with extra spaces or missing spaces
   - Some with emojis or text speak

Here are examples of realistic query variations for a request to get a recipe for a bulking meal with general ingredients, very specific requirements, no specific stores, and for a meal prep batch:

Proper formatting:
- "Can you give me a reciple for bulking up with chicken and rice for a meal prep over the next 3 days?"
- "I'm trying to bulk up and I want to do meal prep with a chicken and rice recipe. I'm looking for a recipe that is high in protein."

All lowercase:
- "can you give me a recipe for bulking up with chicken and rice for a meal prep over the next 3 days?"
- "i'm trying to bulk up and i want to do meal prep with a chicken and rice recipe. i'm looking for a recipe that is high in protein."

Random caps:
- "Can you give me a recipe for bulking up with CHICKEN and RICE for a meal prep over the next 3 days?"
- "I'm trying to BULK UP and I want to do meal prep with a chicken and rice recipe. I'm looking for a recipe that is HIGH in protein."

Common typos:
- "give me a recipe for bulkingup with CHICKEN and RICE for a meal prep over the next 3 dayz?"
- "Im tryin to BULK UP and I want to do meal prep w/ a chicken and rice recipe. need high protein."

Missing punctuation:
- "need a bulking recipe for chicken and rice for a meal prep over the next 3 days"
- "Im trying to bulk up and I want to do meal prep with a chicken and rice recipe ... looking for a recipe that is high in protein."

With emojis/text speak:
- "need meal prep recipe for bulking up with chicken and rice for a meal prep over the next 3 days! 🥗"
- "meal prep recipe asap pls for 3 day bulking"

Generate {{{num_queries_to_generate}}} unique queries,varying the text style naturally."""

In [12]:
synth_query_gen_prompt = bt.load_prompt(project=BT_PROJECT_NAME, slug="synth-query-gen-prompt")
try:
    synth_query_gen_prompt.prompt
except Exception as e:
    bt_syth_query_gen_prompt = bt_project.prompts.create(
        name="SynthQueryGenPrompt",
        slug="synth-query-gen-prompt",
        description="Prompt for generating synthetic queries",
        model="claude-4-sonnet-20250514",
        messages=[{"role": "user", "content": SYNTH_DATA_GEN_PROMPT}],
        if_exists="replace",
    )

    bt_project.publish()
    synth_query_gen_prompt = bt.load_prompt(project=BT_PROJECT_NAME, slug="synth-query-gen-prompt")


Retrying request after error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Sleeping for 0.5 seconds


### Data


We can use the "[BTQL](https://www.braintrust.dev/docs/reference/btql) Sandbox" to help us construct a query to get the "good" dimension tuples from our annotation exercise, and then use that query to get those records to build some synthetic queries.


In [10]:
def get_valid_dimension_tuples():
    cursor = None
    while True:
        response = requests.post(
            "https://staging-api.braintrust.dev/btql",
            json={
                "query": dedent("""
                        select: output
                        from: experiment('656767c2-3ca2-43ad-a68d-ae9c43bec41a')
                        filter: scores."is_good" = 1
                """)
                + (f" | cursor: '{cursor}'" if cursor else ""),
                "use_brainstore": True,
                "brainstore_realtime": True,  # Include the latest realtime data, but a bit slower.
            },
            headers={"Authorization": "Bearer " + os.environ["BRAINTRUST_API_KEY"]},
        )
        response.raise_for_status()
        response_json = response.json()
        data = response_json.get("data", [])
        cursor = response_json.get("cursor")

        return [row["output"] for row in data]


valid_dim_tuples = get_valid_dimension_tuples()

print(len(valid_dim_tuples))
valid_dim_tuples[:2]

19


[{'dietary_goal': 'no_specific_goal',
  'ingredient_specificity': 'specific_ingredients',
  'request_type': 'nutritional_guidance',
  'specificity_level': 'moderately_specific',
  'store_requirement': 'specific_stores',
  'time_commitment': 'quick_20_min'},
 {'dietary_goal': 'bulking',
  'ingredient_specificity': 'general_ingredients',
  'request_type': 'ingredient_substitution',
  'specificity_level': 'very_specific',
  'store_requirement': 'no_specific_stores',
  'time_commitment': 'meal_prep_batch'}]

**Key takeaways**:

1. We can use BTQL to query our logs, experiments, and datasets.
2. Use the "BTQL sandbox" to build and test your queries before putting them in code.


### Generate synthetic queries


In [13]:
class QueryList(BaseModel):
    queries: list[str]

In [19]:
def generate_synth_queries(dim_tuple: dict, num_queries: int = 5, model: str = "gpt-4o-mini", model_kwargs: dict = {}) -> dict:
    prompt = synth_query_gen_prompt.build(
        num_queries_to_generate=num_queries,
        dimension_tuple_json=json.dumps(dim_tuple, indent=2),
    )

    rsp = oai_client.beta.chat.completions.parse(
        model=model,
        messages=prompt["messages"],
        response_format=QueryList,
        **model_kwargs,
    )

    query_list: QueryList = rsp.choices[0].message.parsed  # type: ignore

    return {
        "prompt": prompt["messages"],
        "dimension_tuple": dim_tuple,
        "synth_queries": query_list.queries,
    }

In [21]:
rsp = generate_synth_queries(dim_tuple=valid_dim_tuples[15], num_queries=5)

print(rsp["dimension_tuple"])
for q in rsp["synth_queries"]:
    print(q)

{'dietary_goal': 'convenience', 'ingredient_specificity': 'general_ingredients', 'request_type': 'nutritional_guidance', 'specificity_level': 'moderately_specific', 'store_requirement': 'no_specific_stores', 'time_commitment': 'quick_20_min'}
Can u give me a quick recipe with chicken, veggies, and rice that's easy and healthy?
I need a fast meal prep idea for lunch, no specific stores, something with basic ingridients like pasta or chicken. 🤔
looking for a quick recipe under 20 min to help me eat healthy, general ingredients like beans and whole grains plzz
Quick question - can you suggest a nice recipe for a healthy meal with fish, without needing to go to specific stores?
what's a good recipe i can whip up in 20 min using random ingredients i have at home like eggs and spinach?


In [25]:
def generate_queries_parallel(num_queries: int = 5, model: str = "gpt-4o-mini", model_kwargs: dict = {}):
    """Generate queries in parallel for all dimension tuples."""
    timestamp = datetime.now().strftime("%Y%m%d_%H%M")

    # Run in parallel
    worker = partial(generate_synth_queries, num_queries=num_queries, model=model, model_kwargs=model_kwargs)
    responses = list(ThreadPoolExecutor(max_workers=MAX_WORKERS).map(worker, valid_dim_tuples))

    # Add query items
    all_queries = []
    for response in responses:
        prompt = response["prompt"]

        queries = [{"prompt": prompt, "query": q, "query_source": "synth"} for q in response["synth_queries"]]

        all_queries.extend(queries)

    # Add to experiment
    bt_experiment = bt.init(project=BT_PROJECT_NAME, experiment=f"add_queries_it_{timestamp}")
    query_id = 1

    for query_item in all_queries:
        qid = f"{timestamp}_{query_id:03d}"
        query_id += 1

        with bt_experiment.start_span(name="add_query") as span:
            span.log(
                input=query_item["prompt"],
                output=query_item["query"],
                metadata={
                    "id": qid,
                    "source": query_item["query_source"],
                    "model": model,
                    "model_kwargs": model_kwargs,
                },
            )

    summary = bt_experiment.summarize(summarize_scores=False)
    return summary, all_queries

**Key takeaways**:

1. Use `braintrust.init()` to manually create a new experiment.
2. Add a distinct trace as a single span, including information for `input`, `output`, and `metadata`.
3. Retrieve the experiment summary with `braintrust.summarize()` to review the experiment results..


In [26]:
summary, queries = generate_queries_parallel(num_queries=5, model="gpt-4o-mini")
# summary, queries = generate_queries_parallel(num_queries=2, model="gpt-4o-mini")

print(len(queries))
print(summary)

95

See results for add_queries_it_20250728_2025 at https://www.braintrust.dev/app/aie-course-2025/p/recipe-bot/experiments/add_queries_it_20250728_2025


### Remove invalid queries (human/SME review)

We'll do this in the Braintrust UI


## Step 4: Update our `user_queries` dataset


In [32]:
queries_ds = bt.init_dataset(project=BT_PROJECT_NAME, name="user_queries")
# rows = list(ds)[:5]
rows = list(queries_ds)

print(len(rows))
rows[:2]

Retrying request after error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Sleeping for 0.5 seconds


46


[{'_pagination_key': 'p07532328680059764741',
  '_xact_id': '1000195526467439219',
  'created': '2025-07-29T02:42:12.770Z',
  'dataset_id': '15b4cc00-8f27-400b-a312-31cb036fe7e4',
  'expected': None,
  'id': '46',
  'input': "Give me a Japanese-inspired meal prep recipe using Trader Joe's frozen edamame and their miso ginger broth with tofu for 25g+ protein",
  'is_root': True,
  'metadata': {'id': 46, 'source': 'handcoded', 'submitted_by': 'wayde4'},
  'origin': None,
  'project_id': '54782088-3418-41ea-acda-d6302c2dfa64',
  'root_span_id': '46',
  'span_id': '46',
  'tags': None},
 {'_pagination_key': 'p07532328680059764740',
  '_xact_id': '1000195526467439219',
  'created': '2025-07-29T02:42:12.771Z',
  'dataset_id': '15b4cc00-8f27-400b-a312-31cb036fe7e4',
  'expected': None,
  'id': '45',
  'input': 'Create a Mexican-style ground beef bowl recipe with black beans from Costco that has 40g protein and uses their pre-made guacamole',
  'is_root': True,
  'metadata': {'id': 45, 'source

In [29]:
def get_synth_queries():
    cursor = None
    while True:
        response = requests.post(
            "https://staging-api.braintrust.dev/btql",
            json={
                "query": dedent("""
                        select: output, metadata
                        from: experiment('2b77fc6c-f5bd-4ea5-9060-de29da427816')
                        filter: scores."is_good" = 1
                """)
                + (f" | cursor: '{cursor}'" if cursor else ""),
                "use_brainstore": True,
                "brainstore_realtime": True,  # Include the latest realtime data, but a bit slower.
            },
            headers={"Authorization": "Bearer " + os.environ["BRAINTRUST_API_KEY"]},
        )
        response.raise_for_status()
        response_json = response.json()
        data = response_json.get("data", [])
        cursor = response_json.get("cursor")

        return [row for row in data]

In [30]:
synth_queries = get_synth_queries()

print(len(synth_queries))
synth_queries[:2]

89


[{'metadata': {'human_review': '',
   'id': '20250728_2025_095',
   'model': 'gpt-4o-mini',
   'model_kwargs': {},
   'source': 'synth'},
  'output': 'plz direct me to a good bulking recipe with oats and peanut butter that can be made in an hour, wanna shop at Vons.'},
 {'metadata': {'id': '20250728_2025_093',
   'model': 'gpt-4o-mini',
   'model_kwargs': {},
   'source': 'synth'},
  'output': 'Need a recipe for bulking with chicken breast and sweet potatoes that takes about an hour with ingredients from Ralphs! 😋'}]

In [None]:
for synth_example in synth_queries:
    id = queries_ds.insert(input=synth_example["output"], metadata=synth_example["metadata"])
    # print("Inserted record with id", id)

## Fin
