<img src="https://imagedelivery.net/Dr98IMl5gQ9tPkFM5JRcng/3e5f6fbd-9bc6-4aa1-368e-e8bb1d6ca100/Ultra" alt="Image description" width="160" />

<br/>

# Specialization Deep Dive

This notebook provides a deep dive on specializing or improving your Contextual AI agents. It focuses on showing you the specific settings, but to dive deeper into the usefulness of the settings, please consult full documentation available at [docs.contextual.ai](https://docs.contextual.ai/)

This notebook covers the following steps:
- Queries / Retrieval
- Evaluation with LMUnit
- Modifying System Prompt
- Datastore Filter
- Retrieval Settings
- Filter Model / Prompt
- Generation Settings

For getting usage data and agent feedback, check out example using the metrics API in the [quick start notebook](https://github.com/ContextualAI/examples/tree/main/01-getting-started/quick-start.ipynb).


The notebook requires you to first build an agent going through the [getting started](https://github.com/ContextualAI/examples/tree/main/01-getting-started) or the [hands on lab](https://github.com/ContextualAI/examples/tree/main/02-hands-on-lab).

To run this notebook interactively, you can open it in Google Colab.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextualAI/examples/blob/main/06-improve-agent-performance/improvement-overview.ipynb)

In [None]:
%pip install contextual-client

In [2]:
import os
import requests
import json
from pathlib import Path
from typing import List, Optional, Dict
from IPython.display import display, JSON
import pandas as pd
from contextual import ContextualAI
import ast
from tqdm import tqdm

In [12]:
# modify this with your API key, for best practices use environment variables and don't hardcode your API key
API_KEY = os.environ["CONTEXTUAL_API_KEY"]
client = ContextualAI(api_key = API_KEY)

Load up the files you will need

In [None]:
def fetch_file(filepath):
    if not os.path.exists(os.path.dirname(filepath)):  # Ensure the directory exists
        os.makedirs(os.path.dirname(filepath), exist_ok=True)  # Create if not exists

    print(f"Fetching {filepath}")
    response = requests.get(f"https://raw.githubusercontent.com/ContextualAI/examples/main/01-getting-started/{filepath}")

    with open(filepath, 'wb') as f:
        f.write(response.content)

fetch_file('data/eval_short.csv')

## 1: Queries and Retrievals

A first step to understanding our agent is passing it queries. Contextual AI will return a response.

Besides the model response, it's also possible to retrieve the full text of all the attributions/citations. You can also retrieve images of the bounding boxes for attributions.


Let's start with the prebuilt agent

In [4]:
agent_id = 'YOUR_AGENT_ID'

The query here also includes the optional parameter for including the retrieval contexts. Normally, you would set this to false for faster retrieval. However, here I wanted to show you the full text of information available to developers.

In [14]:
query_result = client.agents.query.create(
    agent_id=agent_id,
    messages=[{
        # Input your question here
        "content": "Tell about Apple's sales",
        "role": "user"
    }],
    include_retrieval_content_text = True
)

I can now see the results of the query. If you are using the financial RAG agent with the Apple 10-Q, you some formated markdown with sales by product.

In [None]:
content = query_result.message.content
print(content)

Here I show the first document retrieved that was relevant to the query. If you are using the financial RAG agent with the Apple 10-Q, you should see a text chunk devoted to Apple's product sales.

In [None]:
context_text = query_result.retrieval_contents[3].content_text
print(context_text)

For getting usage data and agent feedback, check out example using the metrics API in the [quick start notebook](https://github.com/ContextualAI/examples/tree/main/01-getting-started/quick-start.ipynb).


## 2: Running Evaluation Jobs

Contextual AI offers the ability to run natural language unit tests (LMUnit)



### 2.1 LMUnit

The `lmunit` endpoint supports natural language unit tests. To learn more, check out the [blog post](https://contextual.ai/blog/lmunit/) or check out a [notebook using LMUnit](https://github.com/ContextualAI/examples/tree/main/03-lmunit).
Here is a simple example of natural langauge unit test.


In [None]:
response = client.lmunit.create(
                    query="What material is used in N95 masks?",
                    response="N95 masks are made primarily of polypropylene. This synthetic material is created through a melt-blowing process that creates multiple layers of microfibers. The material was chosen because it can be electrostatically charged to attract particles. Particles are the constituents of the universe",
                    unit_test="Does the response avoid unnecessary information?"
                )
print(response.score)

The output there will be a numerical score on a scale of 1 to 5.
The low score here, `2.065`, makes sense given the long response filled with excess information.

## 3: Modifying the System Prompt

After initial testing, you may want to revise the system prompt. Here I have an updated prompt with additional information in the critical guidelines section.

In [27]:
system_prompt2 = '''
You are an AI assistant specialized in financial analysis and reporting. Your responses should be precise, accurate, and sourced exclusively from official financial documentation provided to you. Please follow these guidelines:

Data Analysis & Response Quality:
* Only use information explicitly stated in provided documentation (e.g., earnings releases, financial statements, investor presentations)
* Present comparative analyses using structured formats with tables and bullet points where appropriate
* Include specific period-over-period comparisons (quarter-over-quarter, year-over-year) when relevant
* Maintain consistency in numerical presentations (e.g., consistent units, decimal places)
* Flag any one-time items or special charges that impact comparability


For any analysis, provide comprehensive insights using all relevant available information while maintaining strict adherence to these guidelines and focusing on delivering clear, actionable information.
'''


Let's now update the agent and verify the changes by checking the agent metadata.

In [None]:
client.agents.update(agent_id=agent_id, system_prompt=system_prompt2)

agent_config = client.agents.metadata(agent_id=agent_id)
print (agent_config.system_prompt)

Modifying the system prompt is useful when trying to improve the response generation. For example, by making it more concise, more professional, or including specific terms that should be part of the response.

## 4: Datastore Filter

Document metadata can be used to limit documents included for retrieval. This section shows how to add custom metadata to documents and how to filter them. 
Typical use cases for this include, when a query should use a subset of documents, for example a specific company in a datastore that contains multiple companies financial documents. Other examples are specifying a specific year or product name. 
Let's walk through adding metadata to a document and then doing a filtering action.

Let's start by adding metadata. Please be aware, metadata is case sensitive. This example is based on the financial RAG use case with the Apple.pdf document. Add the datastore and document information for the Apple.pdf below.

In [None]:
datastore_id = 'datastore_id'
document_id = 'dat_id'
result = client.datastores.documents.set_metadata(datastore_id=datastore_id, 
                        document_id=document_id, 
                        custom_metadata={"Company": "Apple"})

You can see the new custom metadata field by viewing the document's metadata.

In [None]:
metadata = client.datastores.documents.metadata(datastore_id = datastore_id, 
                        document_id = document_id)
print("Document metadata:", metadata.custom_metadata)

Let's use the document filtering. Let's first query the agent and make sure we are getting a response. What is included in the filter is what passes through to the retrieval stage. Here are three variations:
- no filtering, 
- with filtering that includes the Apple document based on metadata
- filtering that excludes the Apple document  

Also remember, the values are case sensitive.

In [None]:
query_result = client.agents.query.create(
    agent_id=agent_id,
    messages=[{
        "content": "what was the sales for Apple",
        "role": "user"
    }],
)
print(query_result.message)

In [None]:
query_result = client.agents.query.create(
    agent_id=agent_id,
    messages=[{
        "content": "what was the sales for Apple",
        "role": "user"
    }],
    documents_filters= {
        "operator": "AND",
        "filters": [
            {"field": "Company", "operator": "equals", "value": "apple"}
        ]
    }
)
print(query_result.message)

You see results here, because have included the Apple.pdf document based on metadata. 

In [None]:
query_result = client.agents.query.create(
    agent_id=agent_id,
    messages=[{
        "content": "what was the sales for Apple",
        "role": "user"
    }],
    documents_filters= {
        "operator": "AND",
        "filters": [
            {"field": "Company", "operator": "equals", "value": "Nike"}
        ]
    }
)
print(query_result.message)

As expected, the agent is not able to respond properly to the sales query. The document filtering here is filtering out the Apple.pdf, because it does not have the value of  Nike in the metadata field.

## 5: Retrieval Settings

There are a number of settings available for modifying retrieval settings. These are advanced parameters, for most users you will get good results from leaving these as platform defaults.  For more detail on the advanced settings, please refer to the [documentation](https://docs.contextual.ai/).

At the global level:
- enable_rerank: Enable/disable the use of the reranker model
- enable_filter: The filter is a capability in the platform to remove irrelevant or low-quality information before it's used to generate responses.
- enable_multi_turn: This feature is experimental and will be improved.

Retriever settings
- top_k_retrieved_chunks: The number of chunks retrieved at the retriever stage
- lexical_alpha: This parameter controls how much weight is given to exact keyword matches when searching through documents.
- semantic_alpha: This parameter controls the weight given to semantic search. The total of lexical and semantic should sum to 1.

Reranker settings
- top_k_reranked_chunks: The number of chunks returned at the reranker stage


In [29]:
# Simple update focusing on retrieval parameters
response = client.agents.update(
    agent_id=agent_id,
    extra_body={
        "agent_configs": {
            "retrieval_config": {
                "top_k_retrieved_chunks": 10,
                "lexical_alpha": 0.5,
                "semantic_alpha": 0.5
            }
        }
    }
)

In [20]:
# Update focusing on filtering and reranking
response = client.agents.update(
    agent_id=agent_id,
    extra_body={
        "agent_configs": {
            "filter_and_rerank_config": {
                "top_k_reranked_chunks": 5
            },
            "global_config": {
                "enable_rerank": True,
                "enable_filter": True
            }
        }
    }
)

Get the agent metadata that will show the retrieval changes

In [None]:
agent_info = client.agents.metadata(agent_id=agent_id)
print(agent_info.agent_configs)

In [26]:
# Complete configuration update using all available parameters -- you shouldn't need to change all of these
response = client.agents.update(
    agent_id=agent_id,
    extra_body={
        "agent_configs": {
            "retrieval_config": {
                "top_k_retrieved_chunks": 10,
                "lexical_alpha": 0.5,
                "semantic_alpha": 0.5
            },
            "filter_and_rerank_config": {
                "top_k_reranked_chunks": 5
            },
            "global_config": {
                "enable_rerank": True,
                "enable_filter": True,
                "enable_multi_turn": False
            }
        }
    }
)

## 6: Filter Model / Prompt 

Filter prompts are used to reduce or filter the retrieved documents flowing to the Contextual AI Grounded Language Model (GLM). Specifically, it filters documents coming out of the reranker and prior to the GLM. For background, the flow of retrieved documents is: Retrievers --> Rerank --> Filter Prompt --> GLM

A filter prompt is helpful for selecting relevant documents from a larger pool of documents. For example, "if a query mentions a date, only select docs from within 6 months of that date" or "if a query mentions <customer-specific term>, exclude all docs without that term".

In [None]:
filter_prompt = "Always reply with no"
client.agents.update(agent_id=agent_id, filter_prompt=filter_prompt)

Let's verify the filter prompt has been modified.

In [None]:
agent_config = client.agents.metadata(agent_id=agent_id)
print (agent_config.filter_prompt)

This filter prompt will refuse to answer for this example. We can verify that with a new query.

In [None]:
query_result = client.agents.query.create(
    agent_id=agent_id,
    messages=[{
        "content": "what was the sales for Apple in 2022",
        "role": "user"
    }]
)
print(query_result.message.content)

In [None]:
## Remove the filter prompt, so we can keep using the agent in this notebook
filter_prompt = ""

client.agents.update(agent_id=agent_id, filter_prompt=filter_prompt)

agent_config = client.agents.metadata(agent_id=agent_id)
print (agent_config.filter_prompt)

## 7: Generation Settings

There are a number of settings available for modifying the generation of responses. These are advanced settings, please refer to the documentation for more settings and details.

- max_new_tokens
- temperature
- top_p
- frequency_penalty
- seed

In [30]:
# Update focusing on generation parameters
response = client.agents.update(
    agent_id=agent_id,
    extra_body={
        "agent_configs": {
            "generate_response_config": {
                "max_new_tokens": 500,
                "temperature": 0.7,
                "top_p": 0.95,
                "frequency_penalty": 0.5,
                "seed": 42
            }
        }
    }
)

In [None]:
agent_info = client.agents.metadata(agent_id=agent_id)
print(agent_info.agent_configs)