<img src="https://imagedelivery.net/Dr98IMl5gQ9tPkFM5JRcng/3e5f6fbd-9bc6-4aa1-368e-e8bb1d6ca100/Ultra" alt="Image description" width="160" />

<br/>

# Lab 1: Create an Agent and Datastore

Contextual AI lets you create and use generative AI agents. This notebook introduces an end-to-end example workflow for creating a Retrieval-Augmented Generation (RAG) agent for a financial use case. The agent will answer questions based on the documents provided, but avoid any forward looking statements, e.g., Tell me about sales in 2028. This notebook uses the python client.

This notebook covers the following steps:
- Creating a Datastore
- Ingesting Documents
- Creating an RAG Agent
- Querying an RAG Agent

The full documentation is available at [docs.contextual.ai](https://docs.contextual.ai/)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextualAI/examples/blob/main/02-hands-on-lab/lab1_create_agent.ipynb)

## Prerequisites:

- API key, please contact Contextual AI's sales team to get your API key.

- Data files, this demo also uses 3 files, an ingested document, evaluation dataset, and a training dataset. These are toy datasets to illustrate the functionality of the platform.

      Ingestion: `Apple.pdf`

      Evaluation: `eval_short.csv`

      Training: `fin_train.jsonl`

- To use the python client `pip install --pre contextual-client`

### Lab 1: Creating a Datastore and an Agent

First we'll setup the contextual [python sdk](https://github.com/ContextualAI/contextual-client-python) 🐍:

In [None]:
!pip install contextual-client

In [3]:
import os
import requests
from contextual import ContextualAI
from IPython.display import display, Markdown

🔑 Replace "your_api_key" with your actual API key 👇🏼

In [8]:
API_KEY = "key-..."

In [3]:
contextual = ContextualAI(
    api_key=API_KEY,  # This is the default and can be omitted
)

### Step 1: Create your Datastore


You will need to first create a datastore for your agent. A datastore is secure storage for unstructured data (documents). Each agent can have one or more datastores for storing data securely.

In [4]:
result = contextual.datastores.create(name="financial_research_datastore")
datastore_id = result.id
print(f"Datastore successfully created with ID: {datastore_id}")

Datastore successfully created with ID: 5ee7e13c-871f-406e-85e9-1cbc49679ec9


In [None]:
print(f"Click on the 🔗 to see your datastore: https://app.contextual.ai/ctx/datastores/{datastore_id}")

### Step 2: Ingest Documents into your Datastore

You can now ingest documents into your Agent's datastore using the /datastores endpoint. Documents must be a PDF or HTML file.


I am using a example PDF. You can also use your own documents here. If you have very long documents (hundreds of pages), processing can take longer.

In [None]:
if not os.path.exists('data/Apple.pdf'):
    print(f"Fetching data/Apple.pdf")
    response = requests.get("https://raw.githubusercontent.com/ContextualAI/examples/refs/heads/main/02-hands-on-lab/data/Apple.pdf")
    with open('data/Apple.pdf', 'wb') as f:
        f.write(response.content)

In [6]:
with open('data/Apple.pdf', 'rb') as f:
    ingestion_result = contextual.datastores.documents.ingest(datastore_id, file=f)
    document_id = ingestion_result.id
    print(f"Successfully uploaded to datastore document with ID: {datastore_id}")

Successfully uploaded to datastore document with ID: 5ee7e13c-871f-406e-85e9-1cbc49679ec9


Once ingested, you can view the list of documents, see their metadata, and also delete documents.

In [13]:
metadata = contextual.datastores.documents.metadata(datastore_id = datastore_id, document_id = document_id)
print("Document metadata:", metadata)

Document metadata: DocumentMetadata(id='5260268c-6891-41ba-a1b4-391cee6afd5a', name='Apple.pdf', status='processing')


In [18]:
print(f"Click on the 🔗 to see your document status: https://app.contextual.ai/ctx/datastores/{datastore_id}")

Click on the 🔗 to see your document status: https://app.contextual.ai/ctx/datastores/5ee7e13c-871f-406e-85e9-1cbc49679ec9


<img src="datastore.png" alt="View datastore information in the console">

*Note: make sure the document ingestion status above 👆 shows 'completed' before querying the agent.*

### Step 3: Create your Agent

Next let's create the Agent and modify it to our needs.

Some additional parameters include setting a system prompt or using a previously tuned model.

`system_prompt` is used for the instructions that your RAG system references when generating responses. Note that we do not guarantee that the system will follow these instructions exactly.

In [9]:
system_prompt = '''
You are an AI assistant specialized in financial analysis and reporting. Your responses should be precise, accurate, and sourced exclusively from official financial documentation provided to you. Please follow these guidelines:

Data Analysis & Response Quality:
* Only use information explicitly stated in provided documentation (e.g., earnings releases, financial statements, investor presentations)
* Present comparative analyses using structured formats with tables and bullet points where appropriate
* Include specific period-over-period comparisons (quarter-over-quarter, year-over-year) when relevant
* Maintain consistency in numerical presentations (e.g., consistent units, decimal places)
* Flag any one-time items or special charges that impact comparability

Technical Accuracy:
* Use industry-standard financial terminology
* Define specialized acronyms on first use
* Never interchange distinct financial terms (e.g., revenue, profit, income, cash flow)
* Always include units with numerical values
* Pay attention to fiscal vs. calendar year distinctions
* Present monetary values with appropriate scale (millions/billions)

Response Format:
* Begin with a high-level summary of key findings when analyzing data
* Structure detailed analyses in clear, hierarchical formats
* Use markdown for lists, tables, and emphasized text
* Maintain a professional, analytical tone
* Present quantitative data in consistent formats (e.g., basis points for ratios)

Critical Guidelines:
* Avoid opinions, speculation, or assumptions
* If information is unavailable or irrelevant, clearly state this without additional commentary
* Answer questions directly, then stop
* Do not reference source document names or file types in responses
* Focus only on information that directly answers the query

For any analysis, provide comprehensive insights using all relevant available information while maintaining strict adherence to these guidelines and focusing on delivering clear, actionable information.
'''


Now are ready to create our RAG agent. 

In [10]:
app_response = contextual.agents.create(
    name="Financial Research Agent",
    description="Research Agent using only Historical Information",
    system_prompt=system_prompt,
    datastore_ids=[datastore_id]
)
agent_id= app_response.id
print(f"Agent created successfully with ID: {agent_id}")

Agent created successfully with ID: faf2cc13-a503-40e1-adc9-432b977d9b4a


### Step 4: Query your Agent

Let's query our agent to see if its working and whether the answer provided is correct.

*Note: It may take a few minutes for the document to be ingested and processed. The Assistant will give a detailed answer once the documents are ingested.*

In [21]:
query_result = contextual.agents.query.create(
    agent_id=agent_id,
    messages=[{
        "content": "what was revenue in for Apple in 2022",
        "role": "user"
    }]
)

display(Markdown(query_result.message.content))

**Revenue Analysis for Apple in 2022**

Based on the provided financial documentation, here is the revenue analysis for Apple in 2022:

**Total Revenue:**

* For the three months ended December 31, 2022: **$117,154 million**[1]()[2]()
* For the three months ended December 25, 2021: **$123,945 million**[1]()[2]()

**Revenue Breakdown:**

* **Products:**
	+ iPhone: $65,775 million (December 31, 2022) vs.[1]() $71,628 million (December 25, 2021)[1]()
	+ Mac: $7,735 million (December 31, 2022) vs.[1]() $10,852 million (December 25, 2021)[1]()
	+ iPad: $9,396 million (December 31, 2022) vs.[1]() $7,248 million (December 25, 2021)[1]()
	+ Wearables, Home and Accessories: $13,482 million (December 31, 2022) vs.[1]() $14,701 million (December 25, 2021)[1]()
* **Services:**
	+ Total Services Revenue: $20,766 million (December 31, 2022) vs.[1]()[2]() $19,516 million (December 25, 2021)[1]()[2]()

**Year-over-Year (YoY) Comparison:**

* Total Revenue: -5.5% YoY decrease[1]()[2]()
* Products Revenue: -7.6% YoY decrease[2]()
* Services Revenue: 6.5% YoY increase

You can also query the agent from the console. To do that click on the following link:

In [23]:
print(f"Click on the 🔗 to query your agent: https://app.contextual.ai/ctx/agents/{agent_id}/chat")

Click on the 🔗 to query your agent: https://app.contextual.ai/ctx/agents/faf2cc13-a503-40e1-adc9-432b977d9b4a/chat


<img src="query.png" alt="Query information in the console.">

### Next Steps

Now lets move on to 👉 [Lab 2](/python/hands-on-lab/lab2_evalulate_agent.ipynb) where we show you how to evaluate the accuracy of your RAG agent.
