# **Advanced RAG: Answer recursively Retriever**

## Install Libraries

In [13]:
! pip install sentence-transformers
! pip install --q unstructured langchain
! pip install --q "unstructured[all-docs]"




[notice] A new release of pip is available: 23.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip

[notice] A new release of pip is available: 23.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


^C



[notice] A new release of pip is available: 23.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
! pip install langchain_community fastembed chromadb ollama

## Constants

In [None]:
# Define the directory where your PDFs are stored
pdf_directory = "C:/Users/ili/Downloads/files_rag"
save_dir = pdf_directory

## **1. Extract Texts from PDFs**
use **PyPDFLoader** from LangChain_Community to extract textual data <br>
from **Multipple PDFs**

In [None]:
# general
import os
import datetime

# Lancgain
from langchain_community.document_loaders import PyPDFLoader

# Get a list of all PDF files in the directory
pdf_files = [f for f in os.listdir(pdf_directory) if f.endswith('.pdf')]

# Initialize lists to hold pages from Nvidia and Tesla PDFs separately
nvidia_pages = []


# Iterate through each PDF file and load it
for pdf_file in pdf_files:
    file_path = os.path.join(pdf_directory, pdf_file)
    print(f"Processing file: {file_path}\n")

    # Load the PDF and split it into pages
    loader = PyPDFLoader(file_path=file_path)
    pages = loader.load()


    nvidia_pages.extend(pages)



  from .autonotebook import tqdm as notebook_tqdm


Processing file: C:/Users/ili/Downloads/files_rag\Nvidia_Q2FY25-CFO-Commentary.pdf

Processing file: C:/Users/ili/Downloads/files_rag\Nvidia_Rev_by_Mkt_Qtrly_Trend_Q225.pdf



In [None]:
# print out the first page of the first document for each category as an example
if nvidia_pages:
    print("=========================================")
    print("First page of the first Nvidia document:")
    print("=========================================\n")
    print(nvidia_pages[0].page_content)
else:
    print("No Nvidia pages found in the PDFs.")

First page of the first Nvidia document:

CFO Commentary on Second Quarter Fiscal 2025 Results
Q2 Fiscal 2025 Summary
GAAP
($ in millions, except earnings per 
share) Q2 FY25 Q1 FY25 Q2 FY24 Q/Q Y/Y
Revenue $30,040 $26,044 $13,507 Up 15% Up 122%
Gross margin  75.1 %  78.4 %  70.1 % Down 3.3 pts Up 5.0 pts
Operating expenses $3,932 $3,497 $2,662 Up 12% Up 48%
Operating income $18,642 $16,909 $6,800 Up 10% Up 174%
Net income $16,599 $14,881 $6,188 Up 12% Up 168%
Diluted earnings per share $0.67 $0.60 $0.25 Up 12% Up 168%
Non-GAAP
($ in millions, except earnings per 
share) Q2 FY25 Q1 FY25 Q2 FY24 Q/Q Y/Y
Revenue $30,040 $26,044 $13,507 Up 15% Up 122%
Gross margin  75.7 %  78.9 %  71.2 % Down 3.2 pts Up 4.5 pts
Operating expenses $2,792 $2,501 $1,838 Up 12% Up 52%
Operating income $19,937 $18,059 $7,776 Up 10% Up 156%
Net income $16,952 $15,238 $6,740 Up 11% Up 152%
Diluted earnings per share $0.68 $0.61 $0.27 Up 11% Up 152%
Revenue by Reportable Segments
($ in millions) Q2 FY25 Q1 FY25 Q

## **2. Split Text**
We'll use RecursiveCharacterTextSplitter to break down the large text bodies from the PDFs into manageable chunks.

### 2.1 Text Splitter

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)

# Split text into chunks for Nvidia pages
nvidia_text_chunks = []
for page in nvidia_pages:
    chunks = text_splitter.split_text(page.page_content)
    nvidia_text_chunks.extend(chunks)



### 2.2 Add Metadata

In [None]:


# Example metadata management (customize as needed)
def add_metadata(chunks, doc_title):
    metadata_chunks = []
    for chunk in chunks:
        metadata = {
            "title": doc_title,
            "author": "company",  # Update based on document data
            "date": str(datetime.date.today())
        }
        metadata_chunks.append({"text": chunk, "metadata": metadata})
    return metadata_chunks

# Add metadata to Nvidia chunks
nvidia_chunks_with_metadata = add_metadata(nvidia_text_chunks, "NVIDIA Financial Report")



## **3. Create Embedding from text chunks**

In [None]:
!ollama pull nomic-embed-text:v1.5

In [None]:
! ollama list

NAME                 	ID          	SIZE  	MODIFIED   
nomic-embed-text:v1.5	0a109f422b47	274 MB	2 days ago	
llama3:latest        	365c0bd3c000	4.7 GB	5 days ago	


In [None]:
import ollama

# Function to generate embeddings for text chunks
def generate_embeddings(text_chunks, model_name='nomic-embed-text:v1.5'):
    embeddings = []
    for chunk in text_chunks:
        # Generate the embedding for each chunk
        embedding = ollama.embeddings(model=model_name, prompt=chunk)
        embeddings.append(embedding)
    return embeddings

## Example

In [None]:
# Example: Embed Nvidia text chunks
nvidia_texts = [chunk["text"] for chunk in nvidia_chunks_with_metadata]
nvidia_embeddings = generate_embeddings(nvidia_texts)

nvidia_embeddings

[EmbeddingsResponse(embedding=[-0.30074572563171387, 0.5343104600906372, -2.5193967819213867, -0.012859825044870377, 0.9987686276435852, -1.0452942848205566, 0.5030638575553894, -0.03170178458094597, 0.8926381468772888, 0.43043744564056396, 0.7720985412597656, -0.28202101588249207, 0.2130921632051468, -0.9610076546669006, 0.804760217666626, -0.057274192571640015, -1.072660207748413, -1.3614227771759033, 0.9926325082778931, 0.08230037987232208, -0.04858648404479027, -0.6780625581741333, -1.5120474100112915, 0.19199475646018982, 0.9052993655204773, 1.0891437530517578, -0.10640035569667816, -0.19310790300369263, 0.1501728594303131, 0.3435661792755127, 0.963753342628479, -0.6903637051582336, 0.04898902028799057, 0.01368196401745081, -1.4158802032470703, -0.2851352393627167, -0.04161335155367851, 0.2815900146961212, 0.8641917705535889, -0.04214046895503998, 0.8183640837669373, -0.6694063544273376, 0.44584712386131287, -1.1866129636764526, -0.07618328183889389, -0.019702916964888573, -0.1831

## **4. Store and Use Embeddings in Chroma DB**
After generating the embeddings, you can store them in Chroma DB for efficient retrieval

### **CHROMADB**

In [None]:
from langchain_community.vectorstores import Chroma
from langchain.schema import Document
from langchain_community.embeddings import OllamaEmbeddings

# Wrap Nvidia texts with their respective metadata into Document objects
nvidia_documents = [Document(page_content=chunk['text'], metadata=chunk['metadata']) for chunk in nvidia_chunks_with_metadata]


# Add Nvidia embeddings to the database
nvidia_vector_db = Chroma.from_documents(documents=nvidia_documents,
                      embedding=OllamaEmbeddings(model="nomic-embed-text:v1.5",show_progress=False),
                      collection_name="nvidia-local-rag")

  embedding=OllamaEmbeddings(model="nomic-embed-text:v1.5",show_progress=False),


## **5. Query Processing Multi-Query Retriever:**

Implement a multi-query retriever using Chroma DB. Fetch the most relevant chunks from the database based on user queries.

In [14]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [15]:
# LLM from Ollama
local_model = "llama3:latest"
llm = ChatOllama(model=local_model)

In [35]:
from langchain.prompts import ChatPromptTemplate

# Decomposition
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Only 3 output (only 3 queries):"""
prompt_decomposition = ChatPromptTemplate.from_template(template)

In [36]:
# Chain
generate_queries_decomposition = ( prompt_decomposition | llm | StrOutputParser() | (lambda x: x.split("\n")))

In [42]:
question = '''What are the main revenue drivers for Nvidia this fiscal year?'''
questions = generate_queries_decomposition.invoke({"question":question})

In [43]:
questions 

['Here are three potential sub-questions related to "What are the main revenue drivers for Nvidia this fiscal year?"',
 '',
 "1. **What is Nvidia's primary business segment in terms of revenue generation?**",
 "\t* This question helps narrow down the scope and focus on whether Nvidia's revenue is primarily driven by its GPU, Tegra processor, or other segments such as datacenter or autonomous driving.",
 "2. **Which specific product lines or products are responsible for a significant portion of Nvidia's revenue this fiscal year?**",
 "\t* This sub-question seeks to identify which specific products, such as GeForce GPUs, Quadro professional GPUs, Tesla V100, or Tegra X1, have the most impact on Nvidia's overall revenue.",
 "3. **What are the key factors driving growth in Nvidia's gaming and professional visualization segments this fiscal year?**",
 "\t* This question delves deeper into the specific drivers of revenue growth for Nvidia's core businesses, such as new game releases, adoptio

In [44]:
questions  =["1. What is Nvidia's primary business segment in terms of revenue generation?",
 "2. Which specific product lines or products are responsible for a significant portion of Nvidia's revenue this fiscal year?",
 "3. What are the key factors driving growth in Nvidia's gaming and professional visualization segments this fiscal year?"]
 

In [45]:
# Prompt
template = """Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question: 

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

In [46]:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

def format_qa_pair(question, answer):
    """Format Q and A pair"""
    
    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

q_a_pairs = ""
for q in questions:
    
    rag_chain = (
    {"context": itemgetter("question") | nvidia_vector_db.as_retriever(), 
     "question": itemgetter("question"),
     "q_a_pairs": itemgetter("q_a_pairs")} 
    | decomposition_prompt
    | llm
    | StrOutputParser())

    answer = rag_chain.invoke({"question":q,"q_a_pairs":q_a_pairs})
    q_a_pair = format_qa_pair(q,answer)
    q_a_pairs = q_a_pairs + "\n---\n"+  q_a_pair

    

In [47]:
answer


"Based on the provided financial report, there is no specific information that highlights the key factors driving growth in Nvidia's gaming and professional visualization segments for this fiscal year. The report only provides a summary of the company's performance and outlook for the third quarter of fiscal 2025.\n\nHowever, we can analyze the trend of revenue in these segments to identify possible drivers of growth. The report shows that:\n\n* Gaming segment revenue has been growing consistently, from $2,240 million in Q1 FY24 to $2,880 million in Q2 FY25.\n* Professional Visualization segment revenue has also been growing, albeit at a slower pace, from $295 million in Q1 FY24 to $454 million in Q2 FY25.\n\nWhile the report does not provide specific information on what drives this growth, it is possible that factors such as:\n\n* Increased demand for gaming and visualization solutions due to advancements in AI, machine learning, and other technologies\n* Growing adoption of Nvidia's 

#  Chatting with Local RAG - Hugging Face Embedding + Llama 3 -> Improve Speed

In [48]:
from langchain_community.llms.ollama import Ollama


local_model = "llama3:latest"
cached_llm = Ollama(model=local_model)

  cached_llm = Ollama(model=local_model)


In [49]:
from langchain_community.vectorstores import Chroma
from langchain.schema import Document
from langchain.embeddings import HuggingFaceEmbeddings
from sentence_transformers import SentenceTransformer

# Load a smaller Hugging Face embedding model
#hf_embedding_model = SentenceTransformer('sentence-transformers/paraphrase-MiniLM-L6-v2')

# Wrap the Hugging Face model for use with LangChain
#hf_embeddings = HuggingFaceEmbeddings(model=SentenceTransformer('sentence-transformers/paraphrase-MiniLM-L6-v2'), show_progress=True)

# Wrap Nvidia texts with their respective metadata into Document objects
nvidia_documents = [Document(page_content=chunk['text'], metadata=chunk['metadata']) for chunk in nvidia_chunks_with_metadata]

model_name = "sentence-transformers/paraphrase-MiniLM-L6-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
hf = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

# Add Nvidia embeddings to the database using the smaller Hugging Face model
nvidia_vector_db_hf = Chroma.from_documents(documents=nvidia_documents,
                      embedding=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"),
                      collection_name="nvidia-local-rag-384")


  hf = HuggingFaceEmbeddings(


In [50]:
# Prompt
template = """Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question: 

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

In [52]:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

def format_qa_pair(question, answer):
    """Format Q and A pair"""
    
    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

q_a_pairs = ""
for q in questions:
    
    rag_chain = (
    {"context": itemgetter("question") | nvidia_vector_db_hf.as_retriever(), 
     "question": itemgetter("question"),
     "q_a_pairs": itemgetter("q_a_pairs")} 
    | decomposition_prompt
    | cached_llm 
    | StrOutputParser())

    answer = rag_chain.invoke({"question":q,"q_a_pairs":q_a_pairs})
    q_a_pair = format_qa_pair(q,answer)
    q_a_pairs = q_a_pairs + "\n---\n"+  q_a_pair

    

In [53]:
answer 

"Unfortunately, the provided financial report does not explicitly mention the key factors driving growth in Nvidia's gaming and professional visualization segments this fiscal year.\n\nHowever, we can analyze the report to extract some relevant information that might help answer this question. Here are a few observations:\n\n1. **Gross margin expansion**: The gross margin for both the gaming and professional visualization segments has increased compared to the same period last year (76.6% vs 68.2%, and 77.2% vs 69.7%). This could be driven by factors such as improved manufacturing efficiency, better product mix, or pricing power.\n2. **Non-GAAP operating income growth**: Both segments have seen significant growth in non-GAAP operating income (37.997 billion vs 10.828 billion for gaming, and 32.189 billion vs 9.453 billion for professional visualization). This could be attributed to factors such as increased sales volumes, higher prices, or reduced costs.\n3. **Gaming segment**: The gam