# Simulate Queries and Responses from input text

## Objective

Use the Simulator to generate high-quality queries and responses from your data using LLMs.

This tutorial uses the following Azure AI services:

- Access to Azure OpenAI Service - you can apply for access [here](https://go.microsoft.com/fwlink/?linkid=2222006)
- An Azure AI Studio project - go to [aka.ms/azureaistudio](https://aka.ms/azureaistudio) to create a project

## Time

You should expect to spend 5-10 minutes running this sample. 

## About this example

Large Language Models (LLMs) can help you create query and response datasets from your existing data sources such as text or index. These datasets can be useful for various tasks, such as testing your retrieval capabilities, evaluating and improving your RAG workflows, tuning your prompts and more. In this sample, we will explore how to use the Simulator to generate high-quality query and response pairs from your data using LLMs and simulate interactions with your application with them.




### Data

In this sample we will generate text data from Wikipedia. You can follow the same steps replacing the text with any other source documents of your application's interest. Make sure that the length of the text is within the selected Azure AI model's context length.

## Before you begin



### Installation

Install the following packages required to execute this notebook. 



In [None]:
# Install the packages
%pip install azure-identity azure-ai-evaluation
%pip install promptflow-azure
%pip install wikipedia openai

### Parameters

Lets initialize some variables. We need a way to connect to a LLM to use the notebook. This sample suggests a way to use `gpt-4o-mini` deployment in your Azure AI project. Replace the `azure_openai_endpoint` with a link to your endpoint. If your applications calls `AzureOpenAI`'s chat completion endpoint, you will need to replace the values in `<>` with your `AzureOpenAI` deployment details. 



In [None]:
# project details
azure_openai_api_version = "<your-api-version>"
azure_openai_endpoint = "<your-endpoint>"
azure_openai_deployment = "gpt-4o-mini"  # replace with your deployment name, if different

# Optionally set the azure_ai_project to upload the evaluation results to Azure AI Studio.
azure_ai_project = {
    "subscription_id": "<your-subscription-id>",
    "resource_group": "<your-resource-group>",
    "workspace_name": "<your-workspace-name>",
}

In [None]:
import os

os.environ["AZURE_OPENAI_ENDPOINT"] = azure_openai_endpoint
os.environ["AZURE_OPENAI_DEPLOYMENT"] = azure_openai_deployment
os.environ["AZURE_OPENAI_API_VERSION"] = azure_openai_api_version

### Connect to your project

To start with let us create a config file with your project details.

In [None]:
import json
from pathlib import Path

model_config = {
    "azure_endpoint": azure_openai_endpoint,
    "azure_deployment": azure_openai_deployment,
    "api_version": azure_openai_api_version,
}

# JSON mode supported model preferred to avoid errors ex. gpt-4o-mini, gpt-4o, gpt-4 (1106)

Let us connect to the project. DefaultAzureCredentails will be picked up by the SDK which runs the prompty files to authenticate your requests. If you want to use your AzureOpenAI key to authenticate, you can do so by setting the `api_key` in your `model_config`

In [None]:
from azure.ai.evaluation.simulator import Simulator

simulator = Simulator(model_config=model_config)

### Connecting the simulator to your application
This part assumes that you application is a call to `AzureOpenAI`'s chat completion endpoint. Feel free to change this method to call your application with its configuration.

In [None]:
from typing import List, Dict, Any, Optional
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider


def call_to_your_ai_application(query: str) -> str:
    # logic to call your application
    # use a try except block to catch any errors
    token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

    deployment = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
    endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT")
    client = AzureOpenAI(
        azure_endpoint=endpoint,
        api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
        azure_ad_token_provider=token_provider,
    )
    completion = client.chat.completions.create(
        model=deployment,
        messages=[
            {
                "role": "user",
                "content": query,
            }
        ],
        max_tokens=800,
        temperature=0.7,
        top_p=0.95,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None,
        stream=False,
    )
    message = completion.to_dict()["choices"][0]["message"]
    # change this to return the response from your application
    return message["content"]


async def callback(
    messages: List[Dict],
    stream: bool = False,
    session_state: Any = None,  # noqa: ANN401
    context: Optional[Dict[str, Any]] = None,
) -> dict:
    messages_list = messages["messages"]
    # get last message
    latest_message = messages_list[-1]
    query = latest_message["content"]
    context = None
    # call your endpoint or ai application here
    response = call_to_your_ai_application(query)
    # we are formatting the response to follow the openAI chat protocol format
    formatted_response = {
        "content": response,
        "role": "assistant",
        "context": {
            "citations": None,
        },
    }
    messages["messages"].append(formatted_response)
    return {"messages": messages["messages"], "stream": stream, "session_state": session_state, "context": context}

### Data source for the simulator
In this example we use a wikipedia article as raw text generate Query Response pairs. Alternatively, text from an Azure Search index can be used as a data source for the simulator to generate Query Response pairs. An example of this behavior can be seen in [simulate_input_index sample](..\simulate_input_index\simulate_input_index.ipynb)

In [None]:
import wikipedia

wiki_search_term = "Leonardo da vinci"
wiki_title = wikipedia.search(wiki_search_term)[0]
wiki_page = wikipedia.page(wiki_title)
text = wiki_page.summary[:5000]

### Call to simulator
This call to the simulator generates 4 query response pairs in its first pass.
In the second pass, it picks up one task, pairs it with a query (generated in previous pass) and sends it to the configured llm to build the first user turn. This user turn is then passed to the `callback` method. The conversation continutes till the `max_conversation_turns` turns.

The output of the simulator will have the original task, original query, the original query and the response generated from the first turn as expected response. You can find them in the `context` key of the conversation.

In [None]:
outputs = await simulator(
    target=callback,
    text=text,
    num_queries=4,
    max_conversation_turns=3,
    tasks=[
        f"I am a student and I want to learn more about {wiki_search_term}",
        f"I am a teacher and I want to teach my students about {wiki_search_term}",
        f"I am a researcher and I want to do a detailed research on {wiki_search_term}",
        f"I am a statistician and I want to do a detailed table of factual data concerning {wiki_search_term}",
    ],
)

### Overriding the user simulating behavior
Internally, the SDK has a `prompty` file that defines how the LLM which simulates the user should behave. Our SDK also offers an option for users to override the file, to support your own prompty files. Here's a brief overview of how to accomplish overriding the user behavior.

Make sure you have `user_override.prompty` file in the same directory. The file in this repo takes an additional argument called mood. This is to show how you can add any additional keyword arguments to your prompty.

In [None]:
current_directory = Path.cwd()
user_override_prompty = Path(current_directory) / "user_override.prompty"
user_prompty_kwargs = {"mood": "happy"}

outputs = await simulator(
    target=callback,
    text=text,
    num_queries=4,
    max_conversation_turns=1,
    tasks=[
        f"I am a student and I want to learn more about {wiki_search_term}",
        f"I am a teacher and I want to teach my students about {wiki_search_term}",
        f"I am a researcher and I want to do a detailed research on {wiki_search_term}",
        f"I am a statistician and I want to do a detailed table of factual data concerning {wiki_search_term}",
    ],
    user_simulator_prompty=user_override_prompty,
    user_simulator_prompty_kwargs=user_prompty_kwargs,
)

### Save the generated data for later use

In [None]:
from pathlib import Path

output_file = Path("output.json")
with output_file.open("a") as f:
    json.dump(outputs, f)

### Running evaluations on the simulated data
Here we will try to run GroundednessEvaluator, RelevanceEvaluator, CoherenceEvaluator, FluencyEvaluator, SimilarityEvaluator, F1ScoreEvaluator on the output data from the simulator.

From the documentation we know that running those evaluators need the following data: `query`, `response`, `context`, `ground_truth`

For simplicity's sake, we can use our source document `text` as both `context` and `ground_truth`. This step only evaluates the first user message and first response from your AI Application for each of the simulated conversations.

In [None]:
eval_input_data_json_lines = ""
for output in outputs:
    query = None
    response = None
    context = text
    ground_truth = text
    for message in output["messages"]:
        if message["role"] == "user":
            query = message["content"]
        if message["role"] == "assistant":
            response = message["content"]
    if query and response:
        eval_input_data_json_lines += (
            json.dumps(
                {
                    "query": query,
                    "response": response,
                    "context": context,
                    "ground_truth": ground_truth,
                }
            )
            + "\n"
        )

Store the output in a file

In [None]:
eval_input_data_file = Path("eval_input_data.jsonl")
with eval_input_data_file.open("w") as f:
    f.write(eval_input_data_json_lines)

### Run evaluation
`QAEvaluator` is a composite evaluator which runs GroundednessEvaluator, RelevanceEvaluator, CoherenceEvaluator, FluencyEvaluator, SimilarityEvaluator, F1ScoreEvaluator

Optionally set the azure_ai_project to upload the evaluation results to Azure AI Studio.

In [None]:
from azure.ai.evaluation import evaluate, QAEvaluator

qa_evaluator = QAEvaluator(model_config=model_config)

eval_output = evaluate(
    data=str(eval_input_data_file),
    evaluators={"QAEvaluator": qa_evaluator},
    evaluator_config={
        "QAEvaluator": {
            "column_mapping": {
                "query": "${data.query}",
                "response": "${data.response}",
                "context": "${data.context}",
                "ground_truth": "${data.ground_truth}",
            }
        }
    },
    azure_ai_project=azure_ai_project,  # optional to store the evaluation results in Azure AI Studio
    output_path="./myevalresults.json",  # optional to store the evaluation results in a file
)

## Cleaning up

To clean up all Azure ML resources used in this example, you can delete the individual resources you created in this tutorial.

If you made a resource group specifically to run this example, you could instead [delete the resource group](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/delete-resource-group).