
# RAG Pipeline for NVIDIA 10-K

Deliverables:

- Build 🏗️

  - Data: NVIDIA 10-k Filings
  - Model: OpenAI text-3-embedding small, GPT-3.5-turbo
  - Tooling: LangChain or LlamaIndex (you choose)
  - Vector Store: FAISS
  - Additional Component: Add one of the following:
    - 1. visibility with WandB
    - OR
    - 2. evaluation with RAGAS


- Ship 🚢

  - Evaluate your answers to the following questions
    - "Who is the E-VP, Operations - and how old are they?"
    - "What is the gross carrying amount of Total Amortizable Intangible Assets for Jan 29, 2023?"

  - Record <10 min loom video walkthrough
  - $$ Extra Credit: Deploy to public URL on HF with Chainlit front end



- Please read this link to finish the midterm : https://docs.google.com/forms/d/e/1FAIpQLSe967NILaOfNp1gVeGXHdQJoTw-y07iTyOsy5ZZHfrDlYkqwA/viewform




# Dependencies

In [1]:
!pip install -U -q langchain langchain-openai langchain_core langchain-community langchainhub openai

In [2]:
!pip install -qU ragas

In [3]:
!pip install -qU faiss_cpu pymupdf

In [4]:
import os
import openai
from getpass import getpass

openai.api_key = getpass("Please provide your OpenAI Key: ")
os.environ["OPENAI_API_KEY"] = openai.api_key

Please provide your OpenAI Key: ··········


# Data

In [5]:
!git clone https://github.com/dvu4/AI-Engineering.git

fatal: destination path 'AI-Engineering' already exists and is not an empty directory.


In [6]:
from langchain_community.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader(
    "AI-Engineering/Midterm/NVIDIA 10-k Filings.pdf",
)

documents = loader.load()

In [7]:
# print(f"The document has a length of {len(documents)} ")
# for i in documents:
#   print(f"{i} \n\n")
# print(documents[0])
# # let check first element
# print(f"TYPE : {documents[0].type} \n")
# print(f"METADATA : {documents[0].metadata} \n")
# print(f"PAGE CONTENT : {documents[0].page_content} \n ")
# documents[0].page_content
# documents[0].metadata

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 50
)

documents = text_splitter.split_documents(documents)
len(documents)

624

# Define questions

In [9]:
q1 = "Who is the E-VP, Operations - and how old are they?"
q2 = "What is the gross carrying amount of Total Amortizable Intangible Assets (In millions) for Jan 29, 2023 ?"

# Retriever

In [10]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

In [11]:
from langchain_community.vectorstores import FAISS
vector_store = FAISS.from_documents(documents, embeddings)

In [12]:
retriever = vector_store.as_retriever()

# Verify the questions with baseline RAG pipeline

In [13]:
retrieved_documents = retriever.invoke(q1)
for doc in retrieved_documents:
  print(doc.page_content)
  print("\n")

supports diverse hiring, retention, and employee engagement, which we believe makes NVIDIA a great place to work.
During fiscal year 2025, we will continue to have a flexible work environment and maintain our company wide 2-days off a quarter for employees to rest and
recharge.
Information About Our Executive Officers
The following sets forth certain information regarding our executive officers, their ages, and positions as of February 16, 2024:
Name
Age
Position
Jen-Hsun Huang
60
President and Chief Executive Officer
Colette M. Kress
56
Executive Vice President and Chief Financial Officer
Ajay K. Puri
69
Executive Vice President, Worldwide Field Operations
Debora Shoquist
69


Debora Shoquist
69
Executive Vice President, Operations
Timothy S. Teter
57
Executive Vice President and General Counsel
Jen-Hsun Huang co-founded NVIDIA in 1993 and has served as our President, Chief Executive Officer, and a member of the Board of Directors since our
inception. From 1985 to 1993, Mr. Huang was 

In [14]:
retrieved_documents = retriever.invoke(q2)
for doc in retrieved_documents:
  print(doc.page_content)
  print("\n")

Table of Contents
NVIDIA Corporation and Subsidiaries
Notes to the Consolidated Financial Statements
(Continued)
Note 7 - Amortizable Intangible Assets
The components of our amortizable intangible assets are as follows:
 
Jan 28, 2024
Jan 29, 2023
 
Gross
Carrying
Amount
Accumulated
Amortization
Net 
Carrying
Amount
Gross
Carrying
Amount
Accumulated
Amortization
Net 
Carrying
Amount
 
(In millions)
Acquisition-related intangible
assets (1)
$
2,642 
$
(1,720)
$
922 
$
3,093 
$
(1,614)
$
1,479 
Patents and licensed technology
449 
(259)
190 
446 
(249)
197 
Total intangible assets
$
3,091 
$
(1,979)
$
1,112 
$
3,539 
$
(1,863)
$
1,676


respectively.
Property, equipment and intangible assets acquired by assuming related liabilities during fiscal years 2024, 2023, and 2022 were $170 million, $374 million, and
$258 million, respectively.
 
Jan 28, 2024
Jan 29, 2023
Other assets:
(In millions)
Prepaid supply and capacity agreements (1)
$
2,458 
$
2,989 
Investments in non-affiliated entitie

# Load Model

In [15]:
from langchain import hub
retrieval_qa_prompt = hub.pull("langchain-ai/retrieval-qa-chat")

In [16]:
from langchain.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context. If you cannot answer the question with the context, please respond with 'I don't know':

Context:
{context}

Question:
{question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [17]:
from operator import itemgetter

from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

primary_qa_llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

retrieval_augmented_qa_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": prompt | primary_qa_llm, "context": itemgetter("context")}
)

 ## Send prompt to LLM model

 - the prompt, which contains a formatted question and context, is being sent to the primary_qa_llm language model for processing. The language model will then generate a response based on the input prompt.



In [18]:
prompt | primary_qa_llm

ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="Answer the question based only on the following context. If you cannot answer the question with the context, please respond with 'I don't know':\n\nContext:\n{context}\n\nQuestion:\n{question}\n"))])
| ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7ec9ad0feb60>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7ec9ad0cc040>, temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy='')

In [19]:
result = retrieval_augmented_qa_chain.invoke({"question" : q1})
print(f"The question : \n {q1} \n The answer:")
print(result["response"].content)

The question : 
 Who is the E-VP, Operations - and how old are they? 
 The answer:
Debora Shoquist is the Executive Vice President, Operations, and she is 69 years old.


In [20]:
result = retrieval_augmented_qa_chain.invoke({"question" : q2})
print(f"The question : \n {q2} \n The answer:")
print(result["response"].content)

The question : 
 What is the gross carrying amount of Total Amortizable Intangible Assets (In millions) for Jan 29, 2023 ? 
 The answer:
$3,539


NVIDIA : https://www.sec.gov/Archives/edgar/data/1045810/000104581023000227/nvda-20231029.htm

In [22]:
# # let's test another questions

# q3 = "What is the gross carrying amount of Total Amortizable Intangible Assets for October 29, 2023? " # answer is 3,092
# result = retrieval_augmented_qa_chain.invoke({"question" : q3})
# print(f"The question : \n {q3} \n The answer:")
# print(result["response"].content)

# q4 = "What is the Net Carrying Amount of Total Amortizable Intangible Assets for October 29, 2023? " # answer is 1,251
# result = retrieval_augmented_qa_chain.invoke({"question" : q4})
# print(f"The question : \n {q4} \n The answer:")
# print(result["response"].content)

# q5 = "What is the Accumulated Amortization of Total Amortizable Intangible Assets for October 29, 2023? " # answer is 1,841
# result = retrieval_augmented_qa_chain.invoke({"question" : q5})
# print(f"The question : \n {q5} \n The answer:")
# print(result["response"].content)

# # Future Amortization Expense (In millions)
# # q6 =  "What is the Future Amortization Expense (In millions) in Fiscal Year of 2024 ?"
# q6 =  "What is the total Future Amortization Expense (In millions)?"
# result = retrieval_augmented_qa_chain.invoke({"question" : q6})
# print(f"The question : \n {q6} \n The answer:")
# print(result["response"].content)

# Evaluation and Testing
## *Synthetic Test Set Generation*

In [21]:
eval_documents = documents

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 450
)

eval_documents = text_splitter.split_documents(eval_documents)

len(eval_documents)

624

In [23]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

generator = TestsetGenerator.with_openai()

testset = (
    generator
    .generate_with_langchain_docs(
        eval_documents,
        test_size=10,
        distributions=
         {
             simple: 0.25,
             reasoning: 0.5,
             multi_context: 0.5
          }
        )
  )

In [None]:
# testset.test_data[0]

In [None]:
test_df = testset.to_pandas()
test_questions = test_df["question"].values.tolist()
test_groundtruths = test_df["ground_truth"].values.tolist()

In [41]:
answers = []
contexts = []

for question in test_questions:
  response = retrieval_augmented_qa_chain.invoke({"question" : question})
  answers.append(response["response"].content)
  contexts.append([context.page_content for context in response["context"]])

In [42]:
from datasets import Dataset

response_dataset = Dataset.from_dict({
    "question" : test_questions,
    "answer" : answers,
    "contexts" : contexts,
    "ground_truth" : test_groundtruths
})

In [43]:
response_dataset[0]

{'question': 'What is NVIDIA DLSS and how does it enhance the gaming experience?',
 'answer': 'NVIDIA DLSS is deep learning super sampling, which is an AI technology that enhances the gaming experience by improving graphics quality and performance through the use of artificial intelligence algorithms. It helps to generate smoother, higher quality graphics in games by using AI to upscale lower resolution images to higher resolutions in real-time.',
 'contexts': ['gamers who remaster games, and creators.\nOur gaming platforms leverage our GPUs and sophisticated software to enhance the gaming experience with smoother, higher quality graphics. We developed\nNVIDIA RTX to bring next generation graphics and AI to games. NVIDIA RTX features ray tracing technology for real-time, cinematic-quality rendering. Ray\ntracing, which has long been used for special effects in the movie industry, is a computationally intensive technique that simulates the physical behavior of light\nto achieve greater 

# Evaluate

In [44]:
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    answer_correctness,
    context_recall,
    context_precision,
)

metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    answer_correctness,
]

In [45]:
results = evaluate(response_dataset, metrics)

Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

In [None]:
results

In [46]:
results_df = results.to_pandas()
results_df

Unnamed: 0,question,answer,contexts,ground_truth,faithfulness,answer_relevancy,context_recall,context_precision,answer_correctness
0,What is NVIDIA DLSS and how does it enhance th...,"NVIDIA DLSS is deep learning super sampling, w...","[gamers who remaster games, and creators.\nOur...",NVIDIA DLSS is an AI technology that enhances ...,1.0,0.953094,1.0,1.0,0.545016
1,What are target markets in relation to custome...,Target markets in relation to customer program...,[distributors for specific products are contra...,,0.666667,1.0,0.0,0.0,0.181299
2,"""What factors can cause inconsistent spikes an...",Factors that can cause inconsistent spikes and...,"[impact demand for our products, including by ...",Factors that can cause inconsistent spikes and...,1.0,0.930344,1.0,1.0,0.997675
3,How can competitors' design wins impact our bu...,Competitors' design wins can impact our busine...,[comprehensive IP portfolios and patent protec...,Competitors' design wins can impact our busine...,,0.949377,1.0,0.75,0.811417
4,Which supercomputing chips are affected by exp...,The A100 and H100 integrated circuits are affe...,[supercomputing industries. These restrictions...,The export restrictions impact the A100 and H1...,1.0,0.901975,1.0,1.0,0.738835
5,"""How is revenue distribution determined based ...",Revenue distribution is determined based on th...,"[10 \n(2)\n— \nTotal\n$\n(4,890)\n$\n(5,411)\n...",Revenue by geographic region is designated bas...,1.0,0.896599,1.0,1.0,0.531571
6,What are the consequences of violating export ...,The consequences of violating export control l...,[and generally fulfill our contractual obligat...,If we were ever found to have violated export ...,1.0,0.955203,1.0,1.0,0.774824
7,How does the integration of cloud-based infras...,The integration of cloud-based infrastructure ...,[to some of ours and can use or develop their ...,The integration of cloud-based infrastructure ...,1.0,0.968168,1.0,1.0,0.48937
8,"""What is the role of solution architects in im...",The role of solution architects is to work wit...,"[example, our solution architects work with CS...",The role of solution architects is to improve ...,1.0,0.909947,1.0,1.0,0.484921
9,"""What factors can cause demand fluctuations fo...","Volatility in the cryptocurrency market, new c...","[impact demand for our products, including by ...",Cryptocurrency mining's impact on Gaming GPUs ...,1.0,0.858263,1.0,1.0,0.375566


# Making Adjustments to our RAG Pipeline

- Enhances the RAG by refining and expanding user queries to retrieve more relevant and contextual information from a knowledge base

In [47]:
from langchain.retrievers import MultiQueryRetriever

advanced_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=primary_qa_llm)

In [48]:
from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(primary_qa_llm, retrieval_qa_prompt)

In [49]:
from langchain.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(advanced_retriever, document_chain)

# RAG Responses

In [50]:
response = retrieval_chain.invoke({"input": q1})
print(response["answer"])

The Executive Vice President of Operations is Debora Shoquist, who is 69 years old.


In [51]:
response = retrieval_chain.invoke({"input": q2})
print(response["answer"])

The gross carrying amount of Total Amortizable Intangible Assets for Jan 29, 2023, is $3,539 million.


In [60]:
# print(q3)
# response = retrieval_chain.invoke({"input": "What is the gross carrying amount of Total Amortizable Intangible Assets for Jan 28, 2024? "})
# print(response["answer"])

# print(q4)
# response = retrieval_chain.invoke({"input": q4})
# print(response["answer"])

# print(q5)
# response = retrieval_chain.invoke({"input": q5})
# print(response["answer"])

# print(q6)
# response = retrieval_chain.invoke({"input": q6})
# print(response["answer"])

What is the gross carrying amount of Total Amortizable Intangible Assets for October 29, 2023? 
The gross carrying amount of Total Amortizable Intangible Assets for Jan 28, 2024, is $3,091 million.


In [None]:
answers = []
contexts = []

for question in test_questions:
  response = retrieval_chain.invoke({"input" : question})
  answers.append(response["answer"])
  contexts.append([context.page_content for context in response["context"]])

In [None]:
response_dataset_advanced_retrieval = Dataset.from_dict({
    "question" : test_questions,
    "answer" : answers,
    "contexts" : contexts,
    "ground_truth" : test_groundtruths
})

In [None]:
advanced_retrieval_results = evaluate(response_dataset_advanced_retrieval, metrics)

In [None]:
advanced_retrieval_results_df = advanced_retrieval_results.to_pandas()
advanced_retrieval_results_df