# Optimize your model (fine-tune)

In this notebook, you'll generate a synthetic dataset and use it to evaluate the quality of your pre-trained model. 

## Before you start

Install the necessary libraries:

In [None]:
%pip install azure-ai-evaluation promptflow wikipedia

## Initialize components

Now you need to define the authentication values that will be used when submitting embeddings and chat completion requests through the API endpoint. 

In [None]:
import os

# Define the base URL for your Azure OpenAI Service endpoint
# Replace 'Your Azure OpenAI Service Endpoint' with your actual endpoint URL obtained previously
os.environ["AZURE_OPENAI_ENDPOINT"] = 'Your Azure OpenAI Service Endpoint'

# Define the API key for your Azure OpenAI Service
# Replace 'Your Azure OpenAI Service API Key' with your actual API key obtained previously
os.environ["AZURE_OPENAI_API_KEY"] = 'Your Azure OpenAI Service API Key'

# Define the API version to use for the Azure OpenAI Service
os.environ["OPENAI_API_VERSION"] = '2024-08-01-preview'

# Define the name of the model deployed in your Azure OpenAI Service
os.environ["AZURE_OPENAI_DEPLOYMENT"] = 'gpt-4'

Next, you need to prepare the text for generating the input to the simulator. You will perform a Wikipedia search and extract the first 5000 characters of the fetched page summary:

In [26]:
import wikipedia

# Prepare the text to send to the simulator
wiki_search_term = "Isaac Asimov"
wiki_title = wikipedia.search(wiki_search_term)[0]
wiki_page = wikipedia.page(wiki_title)
text = wiki_page.summary[:5000]

## Define callback function

You can bring any application endpoint to simulate against by specifying a target callback function. In this case, you will use an application that is an LLM with a Prompty file: `application.prompty` 

In [None]:
from promptflow.client import load_flow
from typing import List, Dict, Any, Optional

async def callback(
    messages: List[Dict],
    stream: bool = False,
    session_state: Any = None,  # noqa: ANN401
    context: Optional[Dict[str, Any]] = None,
) -> dict:
    messages_list = messages["messages"]
    # Get the last message
    latest_message = messages_list[-1]
    query = latest_message["content"]
    context = latest_message.get("context", None) # looks for context, default None
    # Call your endpoint or AI application here
    current_dir = os.getcwd()
    prompty_path = os.path.join(current_dir, "application.prompty")
    _flow = load_flow(source=prompty_path)
    response = _flow(query=query, context=context, conversation_history=messages_list)
    # Format the response to follow the OpenAI chat protocol
    formatted_response = {
        "content": response,
        "role": "assistant",
        "context": context,
    }
    messages["messages"].append(formatted_response)
    return {
        "messages": messages["messages"],
        "stream": stream,
        "session_state": session_state,
        "context": context
    }


The callback function above processes each message generated by the simulator.

Tasks performed by the function:

* Retrieves the latest user message.
* Loads a prompt flow from `application.prompty`.
* Generates a response using the prompt flow.
* Formats the response to adhere to the OpenAI chat protocol.
* Appends the assistant's response to the messages list.

## Run the simulator

You can now initialize the simulator and run it to generate synthetic conversations based on the provided text:

In [None]:
import json
from azure.ai.evaluation.simulator import Simulator

model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
}

simulator = Simulator(model_config=model_config)
    
outputs = await simulator(
    target=callback,
    text=text,
    num_queries=1,  # Minimal number of queries
)

output_file = "simulation_output.jsonl"
with open(output_file, "w") as file:
    for output in outputs:
        file.write(output.to_eval_qr_json_lines())

You can review the output .jsonl file and see how the simulated conversation developed with each generated query.

## Evaluate simulation for groundedness

Now that you have a dataset, you can evaluate the quality and effectiveness of your generative AI application. In this example, you will use groundedness as your quality metric. 

In [None]:
from azure.ai.evaluation import GroundednessEvaluator, evaluate

groundedness_evaluator = GroundednessEvaluator(model_config=model_config)
eval_output = evaluate(
    data=output_file,
    evaluators={
        "groundedness": groundedness_evaluator
    },
    output_path="groundedness_eval_output.json"
)

If the groundedness metric isn't close to 1.0, you can change the LLM parameters such as `temperature`, `top_p`, `presence_penalty` or `frequency_penalty` in the `application.prompty` file and re-run the notebook cells to generate a new dataset for evaluation.

## Conclusion

In this exercise you created a synthetic dataset simulating a conversation between an user and a chat completion app. By using this dataset, you can evaluate the quality of your app's responses and fine-tune it to achieve the desired results.

## Clean up

If you've finished the exercise, you should delete the resources you have created to avoid incurring unnecessary Azure costs.

1. Return to the browser tab containing the Azure portal (or re-open the [Azure portal](https://portal.azure.com?azure-portal=true) in a new browser tab) and view the contents of the resource group where you deployed the resources used in this exercise.
1. On the toolbar, select **Delete resource group**.
1. Enter the resource group name and confirm that you want to delete it.