# L7: Conversational RAG

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>

In [1]:
import warnings
warnings.filterwarnings('ignore')

## Import libraries

In [11]:
from ai21 import AI21Client
from ai21.models.chat import ChatMessage
import uuid
import time

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">
<p> 💻 &nbsp; <b>Access <code>requirements.txt</code> and <code>utils.py</code> files:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Open"</em>.

<p> ⬇ &nbsp; <b>Download Notebooks:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Download as"</em> and select <em>"Notebook (.ipynb)"</em>.</p>

<p> 📒 &nbsp; For more help, please see the <em>"Appendix – Tips, Help, and Download"</em> Lesson.</p>
</div>

## Load API key and create AI21Client

In [12]:
from utils import get_ai21_api_key
ai21_api_key = get_ai21_api_key()
client = AI21Client(api_key=ai21_api_key)

In [13]:
from utils import file_upload
file_id = file_upload(client)

Calling POST http://jupyter-api-proxy.internal.dlai/rev-proxy/ai21 failed with a non-200 response code: 403 headers: Headers({'date': 'Fri, 11 Apr 2025 19:29:44 GMT', 'content-type': 'application/json', 'content-length': '60', 'connection': 'keep-alive', 'server': 'gunicorn', 'x-frame-options': 'DENY', 'vary': 'Accept-Language, origin', 'content-language': 'en', 'x-content-type-options': 'nosniff', 'referrer-policy': 'same-origin', 'cross-origin-opener-policy': 'same-origin'})


AI21APIError: Failed with http status code: 403 (AI21APIError). Details: {"error": {"message": "Accessing library/files is invalid"}}

In [10]:
print(file_id)

NameError: name 'file_id' is not defined

In [4]:
from utils import call_convrag

conversation_history = []
def convrag_response(message):
  conversation_history.append(ChatMessage(content=message, role="user"))
  chat_response = call_convrag(client, conversation_history)
  # the LLM response to user query
  response = chat_response.choices[0].content
  # most relevant retrieved text segment
  text_retrieval = chat_response.sources[0].text
  # the file contains the retrieved text segment
  file_retrieval = chat_response.sources[0].file_name
  conversation_history.append(ChatMessage(content=response, role="assistant"))
  return response

## Prompt the Conversational RAG

<p style="background-color:#f7fff8; padding:15px; border-width:3px; border-color:#e0f0e0; border-style:solid; border-radius:6px"> 🚨
&nbsp; <b>Different Run Results:</b> The output generated by AI chat models can vary with each execution due to their probabilistic nature. Don't be surprised if your results differ from those shown in the video.</p>

In [5]:
message = "You are a financial analyst and what is the summary with Nvidia annual earnings report?"

response = convrag_response(message)

print(response)

I'm sorry, I cannot answer your questions based on the documents I have access to.


In [27]:
message = "How much did the Nvidia's revenue increase in the period?"

response = convrag_response(message)

print(response)

Calling POST http://jupyter-api-proxy.internal.dlai/rev-proxy/ai21 failed with a non-200 response code: 500 headers: Headers({'date': 'Fri, 11 Apr 2025 19:38:36 GMT', 'content-type': 'application/json', 'content-length': '34', 'connection': 'keep-alive', 'server': 'gunicorn', 'request-id': '9de83c30-457d-9daf-a38d-cdafbf4d5f94', 'via': '1.1 google', 'cf-cache-status': 'DYNAMIC', 'strict-transport-security': 'max-age=15552000; includeSubDomains', 'expect-ct': 'max-age=86400, enforce', 'referrer-policy': 'same-origin', 'x-content-type-options': 'nosniff', 'x-frame-options': 'SAMEORIGIN', 'x-xss-protection': '1; mode=block', 'cf-ray': '92ecf1f66e6a176a-SJC', 'vary': 'Accept-Language, origin', 'content-language': 'en', 'cross-origin-opener-policy': 'same-origin'})


Exception: Error occurred: Failed with http status code: 500 (AI21ServerError). Details: {"detail":"Internal Server Error"}

In [7]:
message = "Should I buy Nvidia stock now?"

response = convrag_response(message)

print(response)

Calling POST http://jupyter-api-proxy.internal.dlai/rev-proxy/ai21 failed with a non-200 response code: 500 headers: Headers({'date': 'Fri, 11 Apr 2025 19:25:13 GMT', 'content-type': 'application/json', 'content-length': '34', 'connection': 'keep-alive', 'server': 'gunicorn', 'request-id': 'fd24ad6e-f405-f3a9-e39f-c133d033bd48', 'via': '1.1 google', 'cf-cache-status': 'DYNAMIC', 'strict-transport-security': 'max-age=15552000; includeSubDomains', 'expect-ct': 'max-age=86400, enforce', 'referrer-policy': 'same-origin', 'x-content-type-options': 'nosniff', 'x-frame-options': 'SAMEORIGIN', 'x-xss-protection': '1; mode=block', 'cf-ray': '92ecde588a9acf22-SJC', 'vary': 'Accept-Language, origin', 'content-language': 'en', 'cross-origin-opener-policy': 'same-origin'})


Exception: Error occurred: Failed with http status code: 500 (AI21ServerError). Details: {"detail":"Internal Server Error"}

## Create a gradio chat app

In [14]:
import gradio as gr

demo = gr.Interface(
    fn=convrag_response,
    inputs=[gr.Textbox(label="Your questions:", lines=2)],
    outputs=[gr.Textbox(label="AI21 Conversational RAG answer:", lines=2)],
    examples=[
    "How have revenue, gross margin, and net income trended over the past year?",
    "What are the actions taken by the company about sustainability?",
    "What are the main risks of the company?",
    "How is the company allocating its capital (e.g., dividends, share repurchases, acquisitions)?",
    "Are there any concerning trends in operating cash flow?",
    ],
    title="Nvidia 10-K Q&A",
    description="Use AI21 Conversational RAG to retrieval insights from SEC filings",
    allow_flagging="never"
)


demo.launch(server_name="0.0.0.0")

* Running on local URL:  https://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.




Calling POST http://jupyter-api-proxy.internal.dlai/rev-proxy/ai21 failed with a non-200 response code: 500 headers: Headers({'date': 'Fri, 11 Apr 2025 19:33:30 GMT', 'content-type': 'application/json', 'content-length': '34', 'connection': 'keep-alive', 'server': 'gunicorn', 'request-id': '86a79e48-f332-372d-3d84-b9d81efd85f9', 'via': '1.1 google', 'cf-cache-status': 'DYNAMIC', 'strict-transport-security': 'max-age=15552000; includeSubDomains', 'expect-ct': 'max-age=86400, enforce', 'referrer-policy': 'same-origin', 'x-content-type-options': 'nosniff', 'x-frame-options': 'SAMEORIGIN', 'x-xss-protection': '1; mode=block', 'cf-ray': '92ecea7aabcf67a7-SJC', 'vary': 'Accept-Language, origin', 'content-language': 'en', 'cross-origin-opener-policy': 'same-origin'})
Traceback (most recent call last):
  File "/home/jovyan/work/L7/utils.py", line 40, in call_convrag
    chat_response = client.beta.conversational_rag.create(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/us

## RAG with AI21 Jamba model in Langchain

In [15]:
from langchain_ai21 import ChatAI21
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

In [16]:
llm = ChatAI21(model="jamba-1.5-large",
               max_tokens = 4096,
               temperature = 0.4,
               top_p = 1)

In [17]:
loader = TextLoader("./Nvidia_10K_20240128.txt")
doc = loader.load()

In [18]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=400)
documents = text_splitter.split_documents(doc)

In [19]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vectorstore = Chroma.from_documents(documents, embedding=embeddings)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Prompt template

In [20]:
prompt = PromptTemplate.from_template(
    """You are an expert in answering questions based on provided context.
    Answer the question based on the provided context below to the best of your ability.
    The response must be complete, coherent and concise.
    If the answer is not contained in the context, please respond with "answer not in the document"\n
    Here is the context you should use to answer the question: \n
    <context>
    {context}
    </context> \n
    Based on the provided context, answer the following question: {question} \n
    Answer:"""
)

In [21]:
retriever = vectorstore.as_retriever(
    search_type="mmr", 
    search_kwargs={"k": 10})

In [22]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

### Query

In [23]:
q = "How has the company revenue and profit changed from last year?"

response = rag_chain.invoke(q)
print(f"Answer: {response}")

Answer: The company's revenue has increased by 126% from the previous year. This growth was primarily driven by higher demand in the U.S. for the Compute & Networking segment. The company's net income also saw a significant rise, increasing by 581% from the previous year.


In [24]:
docs = retriever.invoke(q)
docs

[Document(metadata={'source': './Nvidia_10K_20240128.txt'}, page_content='Income tax expense (benefit) 6.6  (0.7) Net income 48.9  % 16.2  % Reportable Segments Revenue by Reportable Segments Year Ended Jan 28, 2024 Jan 29, 2023 $ Change % Change ($ in millions) Compute & Networking $ 47,405  $ 15,068  $ 32,337  215  % Graphics 13,517  11,906  1,611  14  % Total $ 60,922  $ 26,974  $ 33,948  126  % Operating Income by Reportable Segments Year Ended Jan 28, 2024 Jan 29, 2023 $ Change % Change ($ in millions) Compute & Networking $ 32,016  $ 5,083  $ 26,933  530  % Graphics 5,846  4,552  1,294  28  % All Other (4,890) (5,411) 521  (10) % Total $ 32,972  $ 4,224  $ 28,748  681  % Compute & Networking revenue  – The year-on-year increase was due to higher Data Center revenue. Compute grew 266% due to higher shipments of the NVIDIA Hopper GPU computing platform for the training and inference of LLMs, recommendation engines and generative AI applications. Networking was up 133% due to higher

In [25]:
questions = ["What are the main business risks for the company?",
             "What are the key financial metrics of the company?",
             "What is the profit growth of the company in the reporting period?",
             "Did the company have a cybersecurity incident based on the following SEC filing document?"
]

for q in questions:
    response = rag_chain.invoke(q)
    print("="*80)
    print(f"Question: {q}")
    print(f"Answer: {response}")

Question: What are the main business risks for the company?
Answer: The company faces several business risks, including failure to meet industry needs, competition, inaccuracies in customer demand estimation, dependency on third-party suppliers, product defects, adverse economic conditions, security breaches, business disruptions, climate change, challenges in realizing business investments or acquisitions, reliance on a limited number of partners and distributors, difficulties in attracting and retaining key employees, and potential disruptions to business processes and information systems.
Question: What are the key financial metrics of the company?
Answer: The key financial metrics of the company include:

1. Gross deferred tax assets of $8,000 million in 2024 and $5,154 million in 2023.
2. Total deferred tax assets of $6,448 million in 2024 and $3,670 million in 2023.
3. Deferred tax liabilities of $831 million in 2024 and $522 million in 2023.
4. Net deferred tax asset of $5,617 m

In [26]:
vectorstore.delete_collection()