# Tune & Evaluation Guide

In the Beginner’s Guide, we went through the process of creating an API Key, creating a Datastore and ingesting documents, creating an Agent, and querying the Agent. This guide covers the next steps of tuning and evaluating your Agent. Make sure you’ve gone through all the steps in the Beginner’s Guide first.

## Tune

We've created a powerful set of APIs that enable you to specialize Agents to your data. Tuning often leads to significant improvements in performance for your specific use cases.

### 1. Create a tune job

To create a tune job, you need a training file and can optionally provide a test file. If no test file is provided, the API will automatically perform a train-test split on the training file.

The API expects the data to be in JSON format with four required fields: `guideline,prompt,reference,knowledge`. See the [API docs](https://docs.contextual.ai/reference/create_tune_job_agents__agent_id__tune_post) for an explanation of each of these fields. Here is a [dummy example of what a tune set should look like](https://drive.google.com/drive/folders/1exULG56OXIquVI7N7NRSD4TKyPWATgXR?usp=drive_link):




Use the following command to create a tune job. You will need to pass in the `agent_id` and `file_path` for your training file. If you do not provide a `model_id`, we will automatically use the Agent’s default model.



Insert your API key here 👇

In [1]:
CONTEXTUAL_API_KEY="key-..."

In [7]:
import os
import requests
from contextual import ContextualAI

In [None]:
# create a client
client = ContextualAI(
    api_key=CONTEXTUAL_API_KEY,
)

# test the API Key
try:
    response = create_agent_output = client.agents.list()
    print("Valid API Key.")
except Exception as e:
    print(f"Invalid API Key: {e}")


In [None]:
# Create an agent with name 'My First Agent'
try:
    create_agent_output = client.agents.create(
        name="My First Agent"
    )
    print(create_agent_output.model_dump_json())
    agent_id = create_agent_output.id
except Exception as e:
    print(f"Encountered error: {e}")


In [None]:
if not os.path.exists('data/Dummy_TuneSet.csv'):
    print(f"Fetching data/Dummy_TuneSet.csv")
    response = requests.get("https://raw.githubusercontent.com/ContextualAI/examples/refs/heads/main/02-tune-eval/data/Dummy_TuneSet.csv")
    with open('data/Dummy_TuneSet.csv', 'wb') as f:
        f.write(response.content)

In [None]:
# create a dataset file
with open("data/Dummy_TuneSet.json", 'rb') as training_file:
    try:
        response = client.agents.tune.create(
            agent_id=agent_id,
            training_file=training_file,
        )
        job_id=response.id
        print(response.to_dict())
    except Exception as e:
        print(e)

When the command runs you’ll be returned a `job_id` for the tune job. Keep in mind that tuning will take several hours to complete.

### 2. Check the status of the tune job

You can check the status of the tune job by passing in the `agent_id` and `job_id`. When the job is complete, the status will change from processing to completed. The response payload will also contain the tuned `model_id` and the `evaluation_results` of the tuned model. The following code waits for the job to complete:

In [None]:
response = client.agents.tune.jobs.metadata(
    agent_id=agent_id,
    job_id=job_id,
)
response.job_status


### 3. Deploy the tuned model

Before you can use the tuned model, you need to deploy it to your Agent. You can do so by editing the configuration of your Agent and passing in the tuned `model_id`. Currently, we only allow a single fine-tuned model to be deployed per tenant. Please see the [API docs](https://docs.contextual.ai/reference/edit_agent_agents__agent_id__put) for more information.

In [None]:
# get the model_id we just trained
try:
    response = client.agents.tune.jobs.metadata(
        agent_id=agent_id,
        job_id=job_id,
    )
    model_id = response.model_id
    print(f"model_id: {model_id}")
except Exception as e:
    print(e)

In [None]:
try:
    response = client.agents.update(
        llm_model_id=model_id,
    )
    print(response.to_dict())
except Exception as e:
    print(e)

The deployment might take a moment to complete.

### 4. Query your tuned model!
After you have deployed the tuned model, you can now query it with the usual command. Make sure you pass your new tuned model_id in.

In [None]:
try:
    query = client.agents.query.create(
        agent_id=agent_id,
        llm_model_id=model_id,
        messages=[{
            # Input your question here
            "content": "What is the revenue of Apple?",
            "role": "user",
        }]
    )
    print(query.message.content)
except Exception as e:
    print(f"Encountered error: {e}")

## Eval

Evaluation endpoints allow you to evaluate your Agent using a set of prompts (questions) and reference (gold) answers. We support two metrics: equivalence and groundedness.

* The first metric (”equivalence”) evaluates if the Agent response is equivalent to the ground truth (model-driven binary classification).
* The second metric (”groundedness”) decomposes the Agent response into claims and then evaluates if the claims are grounded by the retrieved documents.
### 1. Create an evaluation job.
You will need to provide the evaluation data. You can provide the evaluation data in two ways: (i) by uploading an `evalset_file` as a CSV or (ii) creating an eval `Dataset` through the Dataset API. We will be focusing on (i), but you can read about (ii) in our API Docs.

The API expects the data to be in CSV format with two required columns: `prompt`,`reference`. `prompt` is the question, while `reference` is the correct ground truth answer. See the [API docs](https://docs.contextual.ai/reference/create_evaluation_agents__agent_id__evaluate_post) for an explanation of each of these fields. Here is a [dummy example of what an eval set should look like](https://drive.google.com/drive/folders/1exULG56OXIquVI7N7NRSD4TKyPWATgXR?usp=drive_link):

In [None]:
if not os.path.exists('data/Dummy_EvalSet.csv'):
    print(f"Fetching data/Dummy_EvalSet.csv")
    response = requests.get("https://raw.githubusercontent.com/ContextualAI/examples/refs/heads/main/02-tune-eval/data/Dummy_EvalSet.csv")
    with open('data/Dummy_EvalSet.csv', 'wb') as f:
        f.write(response.content)

Use the following command to create your evaluation job. You will need to pass in your `agent_id` and `file_path` to your evaluation set. In the example below, we are evaluating on both equivalence and groundedness, but you can choose to evaluate only one of them.

In [97]:
with open('data/Dummy_EvalSet.csv', 'rb') as f:
    eval_result = client.agents.evaluate.create(
        agent_id=agent_id,
        metrics=["equivalence", "groundedness"],
        evalset_file=f
    )

### 2. Check the status of your evaluation job.
You can use the following command to check the status of your evaluation job, where you’ll need to pass in your `agent_id` and evaluation `job_id`. If the evaluation job has completed, you will see your evaluation `metrics` , `job_metadata`, and the `dataset_name` where your eval metrics and row-by-row results are stored (you will need to use the `/datasets` API to view this dataset).

In [98]:
eval_status = client.agents.evaluate.jobs.metadata(agent_id=agent_id, job_id=eval_result.id)
from tqdm import tqdm

progress = tqdm(total=eval_status.job_metadata.num_predictions)
progress.update(eval_status.job_metadata.num_processed_predictions)
progress.set_description("Evaluation Progress")

### 3. View your evaluation results.
In Step 2, you should be able to get a dataset_name when your evaluation job has completed. You can then view your raw evaluation results (equivalence and/or groundedness scores for each question-response pair) with the `/datasets` endpoint. You can use the following command:

In [100]:
eval_results = client.agents.datasets.evaluate.retrieve(dataset_name=eval_status.dataset_name, agent_id=agent_id)
print(eval_results)

eval_objects = [json.loads(line) for line in eval_results.splitlines()]

df = pd.DataFrame(eval_objects)
df


🎉 That was a quick spin-through our tune and eval endpoints! To learn more about our APIs and their capabilities, visit [docs.contextual.ai](https://docs.contextual.ai). We look forward to seeing what you build with our platform 🏗️.

