<img src="https://imagedelivery.net/Dr98IMl5gQ9tPkFM5JRcng/3e5f6fbd-9bc6-4aa1-368e-e8bb1d6ca100/Ultra" alt="Image description" width="160" />

<br/>

# Tracking Changes in Long Policy Documents Using Contextual AI

Contextual AI lets you create and use generative AI agents. This notebook introduces an example for analyzing complex policy documents and their evolution over time. These RAG Agents overcome traditional challenges of analyzing lengthy documents and identifying policy changes across multiple versions.

This notebook covers the following steps:
- Creating a Datastore
- Ingesting Documents
- Creating an RAG Agent
- Querying an RAG Agent

With the exception of the tuning model, the rest of the notebook can be run in under 15 minutes. 
The full documentation is available at [docs.contextual.ai](https://docs.contextual.ai/)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextualAI/examples/blob/main/05-policy-changes/policy-example.ipynb)

In [None]:
!pip install contextual-client

In [2]:
import os
import requests
import json
from pathlib import Path
from typing import List, Optional, Dict
from IPython.display import display, JSON
import pandas as pd
from contextual import ContextualAI

In [3]:
#Setup API key
#os.environ["CONTEXTUAL_API_KEY"] = API_KEY  # You can store the API key is stored as the environment variable 

client = ContextualAI(
    api_key="CONTEXTUAL_API_KEY"
)

Let's download the files into the local environment if you don't have them

In [None]:
def fetch_file(filepath):
    # Ensure the directory exists before writing the file
    os.makedirs(os.path.dirname(filepath), exist_ok=True)
    
    if not os.path.exists(filepath):
        print(f"Fetching {filepath}")
        url = f"https://raw.githubusercontent.com/ContextualAI/examples/main/05-policy-changes/{filepath}"
        response = requests.get(url)

        if response.status_code == 200:
            with open(filepath, 'wb') as f:
                f.write(response.content)
            print(f"Saved {filepath}")
        else:
            print(f"Failed to fetch {filepath}. HTTP status code: {response.status_code}")

fetch_file('data/FEMA_v2_2017.pdf')
fetch_file('data/FEMA_v3.1_2018.pdf')
fetch_file('data/FEMA_v4_2020.pdf')
fetch_file('/FEMA_2025_updates.pdf')

## Step 1: Create your Datastore


You will need to first create a datastore for your agent using the  /datastores endpoint. A datastore is a secure storage for data. Each agent will have its own datastore for storing data securely.

In [None]:
result = client.datastores.create(name="Demo_FEMA")
datastore_id = result.id
print(f"Datastore ID: {datastore_id}")

## Step 2: Ingest Documents into your Datastore

You can now ingest documents into your Agent's datastore using the /datastores endpoint. Documents must be a PDF or HTML file.

I am wrapping the ingest function in a wrapper to upload all the PDF and HTML files in the `data` folder

In [None]:
def ingest_documents(folder_path, datastore_id) -> Dict[str, str]:
    folder = Path(folder_path)
    document_ids = {}  # Dictionary to store filename: document_id pairs
    
    for file_path in folder.iterdir():
        if file_path.is_file() and file_path.suffix.lower() in ['.pdf', '.html']:
            try:
                with open(file_path, 'rb') as f:
                    ingestion_result = client.datastores.documents.ingest(datastore_id, file=f)
                    document_ids[file_path.name] = ingestion_result.id
                    print(f"Successfully uploaded {file_path.name} to datastore {datastore_id}")
            except Exception as e:
                print(f"Error uploading {file_path.name}: {str(e)}")
    
    return document_ids

# Usage example
folder_path = 'data'
uploaded_docs = ingest_documents(folder_path, datastore_id)

# Now you can access the document IDs like this:
for filename, doc_id in uploaded_docs.items():
    print(f"File: {filename} -> Document ID: {doc_id}")

Once ingested, you can view the list of documents, see their metadata, and also delete documents. The dictionary `uploaded_docs` has the document IDs.
These are very large files, so don't be surprised if it takes a few minutes to finish processing. I have included a simple script for monitoring the progress.  In the meantime you can create the agent. 

In [None]:
metadata_status = {}

# Loop through each document in uploaded_docs
for filename, doc_id in uploaded_docs.items():
    try:
        metadata = client.datastores.documents.metadata(
            datastore_id=datastore_id,
            document_id=doc_id
        )
        metadata_status[filename] = metadata.status
        print(f"Document: {filename}")
        print(f"Status: {metadata.status}")
        print("-" * 50)
    except Exception as e:
        print(f"Error getting metadata for {filename}: {str(e)}")
        metadata_status[filename] = "error"

# Print summary of all document statuses
print("\nSummary of document processing status:")
for filename, status in metadata_status.items():
    print(f"{filename}: {status}")

# Check if all documents are ready
all_ready = all(status == "complete" for status in metadata_status.values())
print(f"\nAll documents ready: {'Yes' if all_ready else 'No'}")

## Step 3: Create your Agent

Next let's create the Agent and modify it to our needs.


Some additional parameters include setting a system prompt or using a previously tuned model.

`system_prompt` is used for the instructions that your RAG agent references when generating responses. Note that we do not guarantee that the system will follow these instructions exactly.

Here I have modified the system prompt to keep in mind differences and versions for the documents. It is expected for you to modify the system prompt for your use case.

In [14]:
system_prompt = '''
You are an analyst focused on identifying differences across documents. V4 was published in 2020, V3.1 was published in 2018 and V2 was published in 2017. If retrieved, V5 is proposed for 2025. When discussing policy keep in mind the version and consider differences in other versions. 
'''


Let's create our agent. 

In [None]:
app_response = client.agents.create(
    name="Demo-PolicyChanges",
    description="Agent to identify policy changes in FEMA documents",
    system_prompt=system_prompt,
    datastore_ids=[datastore_id]
)
agent_id= app_response.id
print(f"Agent ID created: {agent_id}")

## Step 4: Query your Agent

Let's query our agent to see if its working. The required information is the agent_id and messages.  

A good starting query for this use case is how has cost eligibility changed. This shows how the model works across multiple large documents and can track changes between the documents.   

**Note:** It may take a few minutes for the document to be ingested and processed. The Assistant will give a detailed answer once the documents are ingested.

Let's ask it about cost elibigilbity changes. (Even though this question doesn't include a question mark, the agent answers the question)

In [None]:
query_result = client.agents.query.create(
    agent_id=agent_id,
    messages=[{
        "content": "How has cost eligibility changed since 2017",
        "role": "user"
    }]
)
print(query_result.message.content)

Another question that show the comparison and it's great to look at the retrieved results is:

In [None]:
query_result = client.agents.query.create(
    agent_id=agent_id,
    messages=[{
        "content": "What's changed with the Small Business Administration Loan Requirement",
        "role": "user"
    }]
)
print(query_result.message.content)


This is just the start. You can continue asking more queries from the API or use the UI to try more queries. Everything you do in the UI, there are APIs for. Check out [docs.contextual.ai](https://docs.contextual.ai/) 

In [None]:
# Linking to the UI
tenant="" # put the name of your tenant here
print(f"Click on this link to query your Agent: https://app.contextual.ai/{tenant}/agents/{agent_id}/chat")

Some further queries to show off how this RAG Agent tracks changes over time:

Query:
- What's changed with the Small Business Administration Loan Requirement   
- How has the relationship to indian tribal governments changed

The agent can also help generate insights based on changes, try these queries:

- How has the appeals process changed  
- Based on the appeal changes in version 4, how should we change our contracts with experts

To see how the RAG agent responds to new information, add the [FEMA_2025_updates]('FEMA_2025_updates.pdf') to the datastore. This is a synthetic update to show that the agent responds to new information. After loading in the datastore, try the query:

- How has cost eligibility changed since 2017

## Next Steps

In this Notebook, we've created a RAG agent showing changes in FEMA policy. You browse more examples here or learn more at [docs.contextual.ai](https://docs.contextual.ai/). Finally, reach out to your account team if you have further questions or issues.