# Learning Objectives

- Build an LLM assistant for document-based Q&A using retrieval-augmented generation.

# Setup

In [1]:
import chromadb

from openai import AzureOpenAI

from langchain_openai import AzureOpenAIEmbeddings
from langchain_chroma import Chroma

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()

azure_api_key = os.getenv('azure_api_key')
# Modify the Azure Endpoint and the API Versions as needed
azure_base_url = os.getenv('azure_base_url')
azure_api_version = os.getenv('azure_api_version')


In [3]:
client = AzureOpenAI(
  azure_endpoint = azure_base_url,
  api_key = azure_api_key,
  api_version = azure_api_version
)

In [4]:
model_name = 'gpt-4o-mini' # deployment name

In [5]:
embedding_model = AzureOpenAIEmbeddings(
    api_key = azure_api_key,
    azure_endpoint = azure_base_url,
    api_version = azure_api_version,
    azure_deployment="text-embedding-3-large"
)

# Load the Vector Database

Since we persisted the database to to a folder, we can use it from the given location.

In practise, the database is maintained as a separate entity and CRUD operations are managed just as one would for normal databases (e.g., relational databases).

Now that the database is uploaded onto the Colab instance, we can unzip it and attach a retriever.

In [6]:
chromadb_client = chromadb.PersistentClient(
    path="./tesla_db"
)

In [7]:
tesla_10k_collection = 'tesla-10k-2019-to-2023'

In [8]:
vectorstore_persisted = Chroma(
    collection_name=tesla_10k_collection,
    collection_metadata={"hnsw:space": "cosine"},
    embedding_function=embedding_model,
    client=chromadb_client,
    persist_directory="./tesla_db"
)

In [9]:
retriever = vectorstore_persisted.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 5}
)

# RAG Q&A

## Prompt Design

The RAG system message should clearly communicate to the LLM that the input will include a user query along with the necessary context information to address that query. Additionally, the response should rely solely on the context information provided.

In [10]:
qna_system_message = """
You are an assistant to a financial services firm who answers user queries on annual reports.
User input will have the context required by you to answer user queries.
This context will be delimited by: <Context> and </Context>.
The context contains references to specific portions of a document relevant to the user query.

User queries will be delimited by: <Question> and </Question>.

Please answer user queries only using the context provided in the input.
Do not mention anything about the context in your final answer. Your response should only contain the answer to the question.

If the answer is not found in the context, respond "I don't know".
"""

In [11]:
qna_user_message_template = """
<Context>
Here are some documents that are relevant to the question mentioned below.
{context}
</Context>

<Question>
{question}
</Question>
"""

## Retrieving relevant documents

In [12]:
user_query = "What was the annual revenue of the company in 2022?"

In [13]:
relevant_document_chunks = retriever.invoke(user_query)

In [14]:
len(relevant_document_chunks)

5

We can inspect the first document like so:

In [15]:
for document in relevant_document_chunks:
    print(document.page_content.replace("\t", " "))
    break

Furthermore, significant judgment is required in evaluating our tax positions. In the ordinary course of business, there are many transactions and
calculations for which the ultimate tax settlement is uncertain. As a result, we recognize the effect of this uncertainty on our tax attributes or taxes
payable based on our estimates of the eventual outcome. These effects are recognized when, despite our belief that our tax return positions are
supportable, we believe that it is more likely than not that some of those positions may not be fully sustained upon review by tax authorities. We are
required to file income tax returns in the U.S. and various foreign jurisdictions, which requires us to interpret the applicable tax laws and regulations in
effect in such jurisdictions. Such returns are subject to audit by the various federal, state and foreign taxing authorities, who may disagree with respect to
our tax positions. We believe that our consideration is adequate for all open audit years

## Composing the response

To compose the response to user queries, we assemble the prompt that uses the system message defined above and the dynamically retrieved context for the user query.

In [16]:
user_query = "What was the annual revenue of the company in 2022?"

In [17]:
relevant_document_chunks = retriever.invoke(user_query)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = "\n---\n".join(context_list)

prompt = [
    {'role': 'developer', 'content': qna_system_message},
    {'role': 'user', 'content': qna_user_message_template.format(
         context=context_for_query,
         question=user_query
        )
    }
]

try:
    response = client.chat.completions.create(
        model=model_name,
        messages=prompt,
        temperature=0
    )

    prediction = response.choices[0].message.content.strip()
except Exception as e:
    prediction = f'Sorry, I encountered the following error: \n {e}'

print(prediction)

The annual revenue of the company in 2022 was $96.77 billion.


# A RAG Assistant using Gradio

Let us put together the code in this notebook into a file `rag-chat.py` that will open up a basic command line chat interface whenever it is run at the terminal. This naive implementation neverthless illustrates how document Q&A could be automated.

Test Queries:
- What was the total revenue of the company in 2022?
- Summarize 5 key risks identified in the 2023 10k report? Respond with bullet point summaries.
- What is the view of the management on the future of electric vehicle batteries?
- What was the company's debt level in 2023?

In [30]:
import os
import chromadb
import gradio as gr

from openai import AzureOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_chroma import Chroma

# Configuration
model_name = 'gpt-4o-mini'
tesla_10k_collection = 'tesla-10k-2019-to-2023'

azure_api_key = os.getenv('azure_api_key')
azure_base_url = os.getenv('azure_base_url')
azure_api_version = os.getenv('azure_api_version')

# Azure OpenAI Client
client = AzureOpenAI(
    azure_endpoint=azure_base_url,
    api_key=azure_api_key,
    api_version=azure_api_version
)

# Embedding Model
embedding_model = AzureOpenAIEmbeddings(
    api_key=azure_api_key,
    azure_endpoint=azure_base_url,
    api_version=azure_api_version,
    azure_deployment="text-embedding-3-large"
)

# Vector Store (Chroma)
chromadb_client = chromadb.PersistentClient(path="./tesla_db")

vectorstore_persisted = Chroma(
    collection_name=tesla_10k_collection,
    collection_metadata={"hnsw:space": "cosine"},
    embedding_function=embedding_model,
    client=chromadb_client,
    persist_directory="./tesla_db"
)

retriever = vectorstore_persisted.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 5}
)

# Prompt Template
qna_system_message = """
You are an assistant to a financial services firm who answers user queries on annual reports.
User input will have the context required by you to answer user queries.
This context will be delimited by: <Context> and </Context>.
The context contains references to specific portions of a document relevant to the user query.

User queries will be delimited by: <Question> and </Question>.

Please answer user queries only using the context provided in the input.
Do not mention anything about the context in your final answer. Your response should only contain the answer to the question.

If the answer is not found in the context, respond "I don't know".
"""

qna_user_message_template = """
<Context>
Here are some documents that are relevant to the question mentioned below.
{context}
</Context>

<Question>
{question}
</Question>
"""

# Core response logic
def respond(user_query):
    relevant_document_chunks = retriever.invoke(user_query)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = "\n---\n".join(context_list)

    prompt = [
        {'role': 'developer', 'content': qna_system_message},
        {
            'role': 'user', 'content': qna_user_message_template.format(
                context=context_for_query,
                question=user_query)
        }
    ]

    try:
        response = client.chat.completions.create(
            model=model_name,
            messages=prompt,
            temperature=0
        )

        answer = response.choices[0].message.content.strip()
    except Exception as e:
        answer = f'Sorry, I encountered the following error:\n{e}'

    return answer

# Gradio handler
def chat_interface(user_input, chat_history):
    try:
        response = respond(user_input)
    except Exception as e:
        response = f"❌ Error: {str(e)}"
    chat_history.append((user_input, response))
    return "", chat_history


import gradio as gr

def clear_chat():
    return [], ""

with gr.Blocks(title="RAG Chat - Tesla 10K Assistant", theme=gr.themes.Soft()) as demo:
    # Header
    gr.Markdown("""
    <h1 style="text-align: center;">🤖 Tesla 10-K Financial Assistant</h1>
    <p style="text-align: center;">Ask questions about Tesla's annual reports (2019–2023)</p>
    """, elem_id="header")

    # Chatbot + Input Area
    with gr.Column(variant="panel"):
        chatbot = gr.Chatbot(label="📊 Assistant", show_copy_button=True, bubble_full_width=False)
        with gr.Row():
            msg = gr.Textbox(
                placeholder="Type your financial question here...",
                label="Your Question",
                scale=10
            )
            send_btn = gr.Button("🚀 Send", scale=2)

    # Footer Buttons
    with gr.Row():
        clear_btn = gr.Button("🧹 Clear Chat", variant="stop")

    # Function bindings
    send_btn.click(chat_interface, [msg, chatbot], [msg, chatbot])
    msg.submit(chat_interface, [msg, chatbot], [msg, chatbot])
    clear_btn.click(clear_chat, [], [chatbot, msg])

# Launch app
if __name__ == "__main__":
    demo.launch(share=True)


  chatbot = gr.Chatbot(label="📊 Assistant", show_copy_button=True, bubble_full_width=False)
  chatbot = gr.Chatbot(label="📊 Assistant", show_copy_button=True, bubble_full_width=False)


* Running on local URL:  http://127.0.0.1:7864
* Running on public URL: https://a4c2ab2527a9bddfe3.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


<font size=6; color='blue'> **Happy Learning!** </font>
___