# Simulate conversations from conversation starter

## Objective

Use the Simulator to generate high-quality simulated data with pre defined conversation starters using LLMs.

This tutorial uses the following Azure AI services:

- Access to Azure OpenAI Service - you can apply for access [here](https://go.microsoft.com/fwlink/?linkid=2222006)
- An Azure AI Studio project - go to [aka.ms/azureaistudio](https://aka.ms/azureaistudio) to create a project

## Time

You should expect to spend 5-10 minutes running this sample. 

## About this example

Large Language Models (LLMs) can help you create query and response datasets from your existing data sources. These datasets can be useful for various tasks, such as testing your retrieval capabilities, evaluating and improving your RAG workflows, tuning your prompts and more. In this sample, we will explore how to use the Simulator to generate high-quality simulated data with pre defined conversation starters using LLMs.


## Before you begin

### Installation

Install the following packages required to execute this notebook. 

In [None]:
# Install the packages
%pip install azure-identity azure-ai-evaluation
%pip install wikipedia openai

## Connect to your project

To start with let us create a config file with your project details. For `subscription_id`, `resource_group_name` and `project_name`, you can go to the Project Overview page in the AI Studio. Replace the items in <> with values for your project

In [None]:
# Use the following code to set the variables with your values

azure_openai_api_version = "<your-api-version>"
azure_openai_deployment = "<your-deployment>"
azure_openai_endpoint = "<your-endpoint>"

In [None]:
import os

os.environ["AZURE_OPENAI_API_VERSION"] = azure_openai_api_version
os.environ["AZURE_OPENAI_DEPLOYMENT"] = azure_openai_deployment
os.environ["AZURE_OPENAI_ENDPOINT"] = azure_openai_endpoint


model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
}

Let us connect to the project

In [None]:
from azure.ai.evaluation.simulator import Simulator

simulator = Simulator(model_config=model_config)

Connecting the simulator to your application

In [None]:
from typing import List, Dict, Any, Optional
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider


def call_to_your_ai_application(query: str) -> str:
    # logic to call your application
    # use a try except block to catch any errors
    token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
    deployment = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
    endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT")
    client = AzureOpenAI(
        azure_endpoint=endpoint,
        api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
        azure_ad_token_provider=token_provider,
    )
    completion = client.chat.completions.create(
        model=deployment,
        messages=[
            {
                "role": "user",
                "content": query,
            }
        ],
        max_tokens=800,
        temperature=0.7,
        top_p=0.95,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None,
        stream=False,
    )
    message = completion.to_dict()["choices"][0]["message"]
    # change this to return the response from your application
    return message["content"]


async def callback(
    messages: List[Dict],
    stream: bool = False,
    session_state: Any = None,  # noqa: ANN401
    context: Optional[Dict[str, Any]] = None,
) -> dict:
    messages_list = messages["messages"]
    # get last message
    latest_message = messages_list[-1]
    query = latest_message["content"]
    context = None
    # call your endpoint or ai application here
    response = call_to_your_ai_application(query)
    # we are formatting the response to follow the openAI chat protocol format
    formatted_response = {
        "content": response,
        "role": "assistant",
        "context": {
            "citations": None,
        },
    }
    messages["messages"].append(formatted_response)
    return {"messages": messages["messages"], "stream": stream, "session_state": session_state, "context": context}

### Generate Query Responses with conversation starter
Here's how we would define a conversation starter. The code below shows you how to define 2 simulations. Each of the simulation has 3 pre-defined turns to start the conversation. Customize these turns (number of turns and/or the content of them) to make them relevant to your application.

In [None]:
conversation_turns = [
    # simulation 1
    [
        "Hello, how are you?",  # simulator conversation starter (turn 1)
        "I want to learn more about Paris",  # conversation turn 2,
        "Thanks for helping me. What else should I know about Paris for my project",  # conversation turn 3
    ],
    # simulation 2
    [
        "Hey, I really need your help to finish my homework.",
        "I need to write an essay about Paris",
        "Thanks, can you rephrase your last response to make me sound smarter?",
    ],
]

### Call to simulator
The call to the simulator here descibes the basic usage of having a conversation with your application using the conversation starters defined above.
The conversation starts with the first turn of the simulation as the user's turn being passed to the `callback` method. These steps are continued up to `max_conversation_turns` turns. If the number of turns specified in the conversation starter is less than the `max_conversation_turns`, the simulator simulates user's messages with utilizing the conversation history.

In [None]:
outputs = await simulator(
    target=callback,
    conversation_turns=conversation_turns,
    max_conversation_turns=5,
)

### Save the generated data for later use

In [None]:
from pathlib import Path
import json

output_file = Path("output.json")
with output_file.open("a") as f:
    json.dump(outputs, f)