# Synthetic User Message Generation for SmartBite
 
This notebook includes the following sections:

- [1. Import Libraries](#1-import-libraries)
- [2. Define Models and Classes](#2-define-models-and-classes)
- [3. Initialize LLM and Output Parser](#3-initialize-llm-and-output-parser)
- [4. Create Prompt Templates](#4-create-prompt-templates)
- [5. Generate Messages for Each Intention](#5-generate-messages-for-each-intention)
- [6. Generate "None" Intention Messages](#6-generate-none-intention-messages)

## 1. Import Libraries

In [1]:
from pydantic import BaseModel, Field
from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from auxiliar import add_messages, sanitize_input

# Load environment variables
load_dotenv()

True

## 2. Define Models and Classes

In [3]:
# Define the synthetic user message model
class SyntheticUserMessage(BaseModel):
    message: str = Field(
        ...,
        title="Message",
        description="The user message to generate for the target task intention.",
    )

# Define a model for a list of synthetic user messages
class ListSyntheticUserMessages(BaseModel):
    messages: list[SyntheticUserMessage] = Field(
        ...,
        title="Messages",
        description="The list of synthetic user messages to generate for the target task intention.",
    )

## 3. Initialize LLM and Output Parser

In [2]:
# Initialize the LLM
llm = ChatOpenAI(temperature=0.0, model="gpt-4o-mini")

# Define an output parser for generating messages in the correct format
output_parser = PydanticOutputParser(pydantic_object=ListSyntheticUserMessages)

## 4. Create Prompt Templates

In [4]:
# Prompt template
system_prompt = """
You are responsible for generating synthetic user messages for SmartBite, a virtual assistant specializing in recipes and nutrition.

User Intentions:
{user_intentions}

Task:
Generate {k} unique and distinct messages specifically for the following target intention:
"{target_task_intention}"

Description:
{target_task_intention_description}

Guidelines:
1. **Relevance:** Ensure each message strictly pertains to the target intention without deviating to other intents.
2. **Length:** Each message should contain between 5 and 20 words.
3. **Naturalness:** Messages should mimic natural user queries, avoiding jargon or overly formal language.
4. **Clarity:** Avoid ambiguity; ensure messages are clear and direct.
5. **Format Compliance:** Adhere strictly to the specified format to maintain consistency.

Output Format:
{format_instructions}
"""


# Create a prompt template using the system prompt
prompt = PromptTemplate(
    template=system_prompt,
    input_variables=[
        "k",
        "user_intentions",
        "target_task_intention",
        "target_task_intention_description",
    ],
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

## 5. Generate Messages for Each Intention

In [None]:
# Initialize synthetic data generation chain
synthetic_data_chain = prompt | llm | output_parser

In [4]:
# Define SmartBite intentions and descriptions
SMARTBITE_INTENTIONS = [
    "personalized_recipe",
    "ingredient_based_recipe",
    "nutrition_info",
    "step_by_step_instruction",
    "nutrition_goal_sorting",
    "recipe_difficulty_filter",
    "ingredient_update",
    "save_favorite_recipe",
    "about_company",
]

# Generate messages for each intention
intentions_with_descriptions = {
    "personalized_recipe": "The user seeks recipe recommendations tailored to their individual preferences and dietary restrictions, such as vegan, gluten-free, or vegetarian options.",    
    "ingredient_based_recipe": "The user wants to discover recipes they can prepare using specific ingredients they currently have, for example, chicken and rice.",    
    "nutrition_info": "The user is looking for detailed nutritional information about a particular recipe or ingredient, including calories, protein, fat, and other nutrients.",
    "step_by_step_instruction": "The user requires clear and detailed step-by-step instructions for preparing a recipe, encompassing both preparation and cooking procedures.",    
    "nutrition_goal_sorting": "The user wishes to organize recipes based on their nutritional objectives, such as high protein content or low calorie.",    
    "recipe_difficulty_filter": "The user aims to filter recipes according to their difficulty level, categorizing them as easy, simple, hard, difficult or medium.",   
    "ingredient_update": "The user intends to add or remove ingredients from their inventory to keep it current and accurate.",    
    "save_favorite_recipe": "The user wants to mark a recipe as a favorite.",    
    "about_company": "Provide comprehensive information about SmartBite's services, core values, and community initiatives, focusing solely on aspects directly relevant to users.",
}

In [5]:
file_name = "synthetic_intentions.json"

In [7]:
# Generate synthetic messages for each intention
for intention, description in intentions_with_descriptions.items():
    print(f"Generating messages for intention: {intention}")
    
    response = synthetic_data_chain.invoke({
        "k": 80,
        "user_intentions": SMARTBITE_INTENTIONS,
        "target_task_intention": intention,
        "target_task_intention_description": sanitize_input(description),
    })

    synthetic_messages = [
        {"Intention": intention, "Message": message.message}
        for message in response.messages
    ]
    add_messages(synthetic_messages, file_name)

Generating messages for intention: personalized_recipe
Generating messages for intention: ingredient_based_recipe
Generating messages for intention: nutrition_info
Generating messages for intention: step_by_step_instruction
Generating messages for intention: nutrition_goal_sorting
Generating messages for intention: recipe_difficulty_filter
Generating messages for intention: ingredient_update
Generating messages for intention: save_favorite_recipe
Generating messages for intention: about_company


## 6. Generate "None" Intention Messages

In [6]:
# Generate "None" intention messages
none_intention_prompt = """
You are tasked with generating synthetic user messages for casual small talk and unrelated conversations.

The user intentions are:
{user_intentions}

Your task is to create {k} distinct messages that are entirely unrelated to the provided user intentions. 
These messages should reflect natural human interactions and typical small talk, covering general topics, personal interests, opinions, or everyday questions. 
The focus is to simulate conversational inputs that do not align with any specific task or structured intention.

Examples of such small talk include:
- Greetings: "Hi, how are you?" or "Good morning!"
- Interests: "Do you like football?" or "What’s your favorite movie?"
- Opinions: "I think rainy days are cozy."
- Random questions: "What do you think about aliens?" or "Have you ever traveled to Italy?"

Guidelines:
1. Avoid including content that overlaps with or hints at the provided user intentions.
2. Create diverse messages spanning casual, lighthearted, or open-ended conversation topics.
3. Each message should be concise, between 5 and 20 words.
4. Ensure the messages are natural, varied, and reflective of typical human conversation.
5. Include polite and friendly tone to simulate engaging dialogue.

Message format:
{format_instructions}
"""


none_intention_prompt = PromptTemplate(
    template=none_intention_prompt,
    input_variables=["k", "user_intentions"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

synthetic_data_chain = none_intention_prompt | llm | output_parser

In [7]:
response_none = synthetic_data_chain.invoke({
    "k": 80,
    "user_intentions": SMARTBITE_INTENTIONS,
})

none_related_messages = [
    {"Intention": "None_related", "Message": sanitize_input(message.message)}
    for message in response_none.messages
]
add_messages(none_related_messages, file_name)