## 👉 START HERE: How to use this notebook

### Step 1: Create synthetic evaluation data

To measure your Agent's quality, you need a diverse, representative evaluation set.  This notebook turns your unstructured documents into a high-quality synthetic evaluation set so that you can start to evaluate and improve your Agent's quality before subject matter experts are available to label data.

This notebook does the following:
1. Loads 

**Important note:** Throughout this notebook, we indicate which cells you:
- ✅✏️ *should* customize - these cells contain config settings to change
- 🚫✏️ *typically will not* customize - these cells contain  code that is parameterized by your configuration.

*Cells that don't require customization still need to be run!*

### 🚫✏️ Install Python libraries

In [0]:
%pip install -qqqq -U -r requirements.txt
dbutils.library.restartPython()

### 🚫✏️ Connect to Databricks

If running locally in an IDE using Databricks Connect, connect the Spark client & configure MLflow to use Databricks Managed MLflow.  If this running in a Databricks Notebook, these values are already set.

In [41]:
from mlflow.utils import databricks_utils as du

if not du.is_in_databricks_notebook():
    from databricks.connect import DatabricksSession
    import os

    spark = DatabricksSession.builder.getOrCreate()
    os.environ["MLFLOW_TRACKING_URI"] = "databricks"

In [4]:
%load_ext autoreload
%autoreload 2


### 🚫✏️ Load the Agent's storage locations

This notebook writes to the evaluation set table that you specified in the [Agent setup](02_agent_setup.ipynb) notebook.

In [42]:
from utils.cookbook.agent_storage_config import AgentStorageConfig
from utils.cookbook.databricks_utils import get_table_url

# Load the Agent's storage configuration
agent_storage_config = AgentStorageConfig.from_yaml_file('./configs/agent_storage_config.yaml')

# Check if the evaluation set already exists
try:
    eval_dataset = spark.table(agent_storage_config.evaluation_set_uc_table)
    if eval_dataset.count() > 0:
        print(f"Evaluation set {get_table_url(agent_storage_config.evaluation_set_uc_table)} already exists!  By default, this notebook will append to the evaluation dataset.  If you would like to overwrite the existing evaluation set, please delete the table before running this notebook.")
    else:
        print(f"Evaluation set {get_table_url(agent_storage_config.evaluation_set_uc_table)} exists, but is empty!  By default, this notebook will NOT change the schema of this table - if you experience schema related errors, drop this table before running this notebook so it can be recreated with the correct schema.")
except Exception:
    print(f"Evaluation set `{agent_storage_config.evaluation_set_uc_table}` does not exist.  This notebook will create a new Delta Table at this location.")

Evaluation set `main.eric_peter_agents.my_agent_eval_set` does not exist.  This notebook will create a new Delta Table at this location.


#### ✅✏️ Load the source documents for synthetic evaluation data generation

Most often, this will be the same as the document output table from the [data pipeline](01_data_pipeline.ipynb).

Here, we provide code to load the documents table that was created in the [data pipeline](01_data_pipeline.ipynb).

Alternatively, this can be a Spark DataFrame, Pandas DataFrame, or list of dictionaries with the following keys/columns:
- `doc_uri`: A URI pointing to the document.
- `content`: The content of the document.

In [43]:
from utils.data_pipeline.data_pipeline_config import DataPipelineConfig
from pyspark.errors import PySparkAttributeError

datapipeline_config = DataPipelineConfig.from_yaml_file('./configs/data_pipeline_config.yaml')

source_documents = spark.table(datapipeline_config.output.parsed_docs_table)

try: # if in DBX notebook, render the dataframe, otherwise print it
    source_documents.display()
except PySparkAttributeError as e:
    source_documents.show()

+--------------------+-------------+--------------------+-------------------+
|             content|parser_status|             doc_uri|      last_modified|
+--------------------+-------------+--------------------+-------------------+
|**1. Create a UC ...|      SUCCESS|/Volumes/ep/cookb...|2024-11-04 01:32:30|
+--------------------+-------------+--------------------+-------------------+



#### ✅✏️ Run the synthetic evaluation data generation

Optionally, you can customize the guidelines to guide the synthetic data generation.  By default, guidelines are not applied - to apply the guidelines, uncomment `guidelines=guidelines` in the `generate_evals_df(...)` call.  See our [documentation](https://docs.databricks.com/en/generative-ai/agent-evaluation/synthesize-evaluation-set.html) for more details.

In [44]:
from databricks.agents.eval import generate_evals_df

# NOTE: The guidelines you provide are a free-form string. The markdown string below is the suggested formatting for the set of guidelines, however you are free
# to add your sections here. Note that this will be prompt-engineering an LLM that generates the synthetic data, so you may have to iterate on these guidelines before
# you get the results you desire.
guidelines = """
# Task Description
The Agent is a RAG chatbot that answers questions about using Spark on Databricks. The Agent has access to a corpus of Databricks documents, and its task is to answer the user's questions by retrieving the relevant docs from the corpus and synthesizing a helpful, accurate response. The corpus covers a lot of info, but the Agent is specifically designed to interact with Databricks users who have questions about Spark. So questions outside of this scope are considered irrelevant.

# User personas
- A developer who is new to the Databricks platform
- An experienced, highly technical Data Scientist or Data Engineer

# Example questions
- what API lets me parallelize operations over rows of a delta table?
- Which cluster settings will give me the best performance when using Spark?

# Additional Guidelines
- Questions should be succinct, and human-like
"""

synthesized_evals_df = generate_evals_df(
    docs=source_documents,
    # The number of evaluations to generate for each doc.
    num_questions_per_doc=2,
    # A optional set of guidelines that help guide the synthetic generation. This is a free-form string that will be used to prompt the generation.
    # guidelines=guidelines
)

# Write the synthetic evaluation data to the evaluation set table
spark.createDataFrame(synthesized_evals_df).write.format("delta").mode("append").saveAsTable(agent_storage_config.evaluation_set_uc_table)

# Display the synthetic evaluation data
eval_set_df = spark.table(agent_storage_config.evaluation_set_uc_table).show()
try:
    eval_set_df.display()
except PySparkAttributeError as e:
    eval_set_df.show()

Generating evaluations:   0%|          | 0/1 documents processed [Elapsed: 00:00, Remaining: ?]I0000 00:00:1730763702.841937 1622882 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
Generating evaluations: 100%|██████████| 1/1 documents processed [Elapsed: 00:00, Remaining: 00:00]


PySparkValueError: [CANNOT_INFER_EMPTY_SCHEMA] Can not infer schema from an empty dataset.