# Simulate Queries and Responses from Azure Search Index

## Objective

Use the Simulator to generate high-quality queries and responses from your data in Azure Search using LLMs.

This tutorial uses the following Azure AI services:

- Access to Azure OpenAI Service - you can apply for access [here](https://go.microsoft.com/fwlink/?linkid=2222006)
- An Azure AI Search service - go to [aka.ms/azuresearch](https://aka.ms/azuresearch) to create a service 

## Time

You should expect to spend 5-10 minutes running this sample. 

## About this example

Large Language Models (LLMs) can help you create query and response datasets from your existing data sources such as text or index. These datasets can be useful for various tasks, such as testing your retrieval capabilities, evaluating and improving your RAG workflows, tuning your prompts and more. In this sample, we will explore how to use the Simulator to generate high-quality query and response pairs from your search index using LLMs and simulate interactions with your application with them. 


### Data

In this sample we will generate text data from an Azure Search index called `realestate-us-sample-index`.  You can follow the same steps replacing the index with any other index in your Azure Search service. To create the index used in this sample, go to **Import data** when creating an index in your Azure Search Service, and select the *realestate-us-sample* data source.

The `realestate-us-sample-index` contains the following fields:


| Field Name       | Type              |
|------------------|-------------------|
| listingId        | String            |
| beds             | Int32             |
| baths            | Int32             |
| description      | String            |
| description_de   | String            | 
| description_fr   | String            |
| description_it   | String            |
| description_es   | String            |
| description_pl   | String            |
| description_nl   | String            |
| sqft             | Int32             |
| daysOnMarket     | Int32             |
| status           | String            |
| source           | String            |
| number           | String            |
| street           | String            |
| unit             | String            | 
| type             | String            |
| city             | String            |
| region           | String            |
| countryCode      | String            |
| postCode         | String            |
| location         | GeographyPoint    |
| price            | Int64             |
| thumbnail        | String            | 
| tags             | StringCollection  |


## Before you begin



### Installation

Install the following packages required to execute this notebook. 



In [None]:
# Install the packages
%pip install azure-identity azure-ai-evaluation
%pip install azure-search-documents

### Parameters

Lets initialize some variables. We need a way to connect to a LLM to use the notebook. This sample suggests a way to use `gpt-4o-mini` deployment in your Azure AI project. Replace the `azure_endpoint` with a link to your endpoint. If your applications calls `AzureOpenAI`'s chat completion endpoint, you will need to replace the values in `<>` with your `AzureOpenAI` deployment details. 

In [None]:
import os

os.environ["AZURE_OPENAI_DEPLOYMENT"] = "<your-deployment>"
os.environ["AZURE_OPENAI_API_VERSION"] = "<api version>"

In [None]:
# project details
azure_endpoint = "https://<ai-hub>.openai.azure.com"
azure_deployment = "gpt-4o-mini"  # replace with your deployment name, if different

should_cleanup: bool = False

### Connect to your project

To start with let us create a config file with your project details.

In [None]:
import json
from pathlib import Path

model_config = {
    "azure_endpoint": azure_endpoint,
    "azure_deployment": azure_deployment,
}

# JSON mode supported model preferred to avoid errors ex. gpt-4o-mini, gpt-4o, gpt-4 (1106)

### Connect to your Azure Search index

In [None]:
search_endpoint = "<your-search-endpoint>"
index_name = "<your-index-name>"
api_key = "<your-api-key>"

Let us connect to the project. DefaultAzureCredentails will be picked up by the SDK which runs the prompty files to authenticate your requests. If you want to use your AzureOpenAI key to authenticate, you can do so by setting the `api_key` in your `model_config`

In [None]:
from azure.ai.evaluation.simulator import Simulator

simulator = Simulator(model_config=model_config)

Connecting the simulator to your application

In [None]:
from typing import List, Dict, Any, Optional
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider


def call_to_your_ai_application(query: str) -> str:
    # logic to call your application
    # use a try except block to catch any errors

    token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
    deployment = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
    endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT")
    client = AzureOpenAI(
        azure_endpoint=endpoint,
        api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
        azure_ad_token_provider=token_provider,
    )
    completion = client.chat.completions.create(
        model=deployment,
        messages=[
            {
                "role": "user",
                "content": query,
            }
        ],
        max_tokens=800,
        temperature=0.7,
        top_p=0.95,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None,
        stream=False,
    )
    message = completion.to_dict()["choices"][0]["message"]
    # change this to return the response from your application
    return message["content"]


async def callback(
    messages: List[Dict],
    stream: bool = False,
    session_state: Any = None,  # noqa: ANN401
    context: Optional[Dict[str, Any]] = None,
) -> dict:
    messages_list = messages["messages"]
    # get last message
    latest_message = messages_list[-1]
    query = latest_message["content"]
    context = None
    # call your endpoint or ai application here
    response = call_to_your_ai_application(query)
    # we are formatting the response to follow the openAI chat protocol format
    formatted_response = {
        "content": response,
        "role": "assistant",
        "context": {
            "citations": None,
        },
    }
    messages["messages"].append(formatted_response)
    return {"messages": messages["messages"], "stream": stream, "session_state": session_state, "context": context}

### Generate Query Responses from index
In this example we use the `description` field from the `realestate-us-sample-index` search index as raw text to generate Query Response pairs. For any index you use to generate Query Responses, you must identify from which field from `result` in the code below you would like to generate.


Below is what a search for the term `New York` might return:

```json
{
  "@odata.context": "https://<your-search-service-name>.search.windows.net/indexes('realestate-us-sample-index')/$metadata#docs(*)",
  "@odata.count": 4959,
  "@search.nextPageParameters": {
    "search": "*",
    "count": true,
    "skip": 50
  },
  "value": [
    {
      "@search.score": 1,
      "listingId": "OTM4MjI2NQ2",
      "beds": 5,
      "baths": 4,
      "description": "This is a apartment residence and is perfect for entertaining.  This home provides lakefront property located close to parks and features a detached garage, beautiful bedroom floors and lots of storage.",
      "description_de": "Dies ist eine Wohnanlage und ist perfekt für Unterhaltung.  Dieses Haus bietet Seeliegenschaft Parks in der Nähe und verfügt über eine freistehende Garage schöne Zimmer-Etagen and viel Stauraum.",
      "description_fr": "Il s’agit d’un appartement de la résidence et est parfait pour se divertir.  Cette maison offre propriété au bord du lac Situé à proximité de Parcs et dispose d’un garage détaché, planchers de belle chambre and beaucoup de rangement.",
      "description_it": "Si tratta di un appartamento residence ed è perfetto per intrattenere.  Questa casa fornisce proprietà lungolago Situato vicino ai parchi e dispone di un garage indipendente, piani di bella camera da letto and sacco di stoccaggio.",
      "description_es": "Se trata de una residencia Apartamento y es perfecto para el entretenimiento.  Esta casa ofrece propiedad de lago situado cerca de parques y cuenta con un garaje independiente, pisos de dormitorio hermoso and montón de almacenamiento.",
      "description_pl": "Jest to apartament residence i jest idealny do zabawy.  Ten dom zapewnia lakefront Wlasciwosc usytuowany w poblizu parków i oferuje garaz wolnostojacy, piekna sypialnia podlogi and mnóstwo miejsca do przechowywania.",
      "description_nl": "Dit is een appartement Residentie en is perfect voor entertaining.  Dit huis biedt lakefront eigenschap vlakbij parken en beschikt over een vrijstaande garage, mooie slaapkamer vloeren and veel opslag.",
      "sqft": 12960,
      "daysOnMarket": 9,
      "status": "sold",
      "source": "Pérez Realty",
      "number": "19339",
      "street": "Linden Avenue North",
      "unit": "658",
      "type": "Apartment",
      "city": "Shoreline",
      "region": "wa",
      "countryCode": "us",
      "postCode": "98133",
      "location": {
        "type": "Point",
        "coordinates": [
          -122.35,
          47.7699
        ],
        "crs": {
          "type": "name",
          "properties": {
            "name": "EPSG:4326"
          }
        }
      },
      "price": 3693600,
      "thumbnail": "https://searchdatasets.blob.core.windows.net/images/bd5bt4apt.jpg",
      "tags": [
        "apartment residence",
        "entertaining",
        "lakefront property",
        "parks",
        "detached garage",
        "beautiful bedroom floors",
        "lots of storage"
      ]
    },
    ...
  ]
}
```

In [None]:
import requests


def generate_text_from_index(search_term: str) -> str:
    url = f"{search_endpoint}/indexes/{index_name}/docs/search?api-version=2024-07-01"
    headers = {"Content-Type": "application/json"}
    search_query = {"search": search_term, "top": 10}
    response = requests.post(url=url, headers=headers, data=json.dumps(search_query))

    text = ""
    if response.status_code == 200:
        results = response.json()
        for result in results["value"]:
            text += result["description"]

    return text[:5000]

In [None]:
real_estate_search_term = "New York"
text = generate_text_from_index(real_estate_search_term)

### Call to simulator
This call to the simulator generates 4 query response pairs in its first pass.
In the second pass, it picks up one task, pairs it with a query (generated in previous pass) and sends it to the configured llm to build the first user turn. This user turn is then passed to the `callback` method. The conversation continutes till the `max_conversation_turns` turns.

The output of the simulator will have the original task, original query, the original query and the response generated from the first turn as expected response. You can find them in the `context` key of the conversation.

In [None]:
outputs = await simulator(
    target=callback,
    text=text,
    num_queries=4,
    max_conversation_turns=3,
    tasks=[
        f"I am a prospective buyer and I want to learn more about {real_estate_search_term}",
        f"I am a real estate agent and I want to inform potential buyers about {real_estate_search_term}",
        f"I am a researcher and I want to do a detailed research on {real_estate_search_term}",
        f"I am a statistician and I want to do a detailed table of factual data concerning {real_estate_search_term}",
    ],
)

### Save the generated data for later use

In [None]:
output_file = Path("output.json")
with output_file.open("a") as f:
    json.dump(outputs, f)

## Cleaning up

To clean up all Azure ML resources used in this example, you can delete the individual resources you created in this tutorial.

If you made a resource group specifically to run this example, you could instead [delete the resource group](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/delete-resource-group).