In [None]:
# **AI-Powered Financial Analysis System - Project Prerequisites**

This document outlines the necessary setup to complete all modules of the RAG learning roadmap. Please configure these items before starting Module 1.

### **Required API Keys**

You will need to create free accounts for the following services and obtain API keys. We will load these into our Google Colab environment using the secrets manager.

1.  **Groq API Key**
    * **Purpose:** Provides access to high-speed LLM inference.
    * **Get it here:** [https://console.groq.com/keys](https://console.groq.com/keys)

2.  **Hugging Face User Access Token**
    * **Purpose:** Allows us to download models and use the Hugging Face ecosystem. A `read` role is sufficient.
    * **Get it here:** [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)

### **Required Files & Environment**

1.  **Environment:** Google Colab.
2.  **Source Document:** The **NVIDIA Q1 FY26 Earnings Report** PDF.
    * **Action:** Download the press release PDF from the official NVIDIA news site: [NVIDIA Q1 FY2026 Financial Results](https://investor.nvidia.com/financial-info/quarterly-results/default.aspx).

    DIRECT PDF Download LINK -> https://s201.q4cdn.com/141608511/files/doc_financials/2026/q1/b6df1c5c-5cb6-4a41-9d28-dd1bcd34cc26.pdf
    * You will need to upload this file (`.pdf`) to your Colab session at the start of each module.

---


# **Module 1: The Foundation - Basic RAG**

### **Objective**
In this module, we will build the simplest possible Retrieval-Augmented Generation (RAG) pipeline. This serves as our **performance baseline**. Our goal is to understand the fundamental workflow of a RAG system and to observe its inherent limitations when applied to a complex financial document.

### **Core Concept: The Classic RAG Pipeline**
We will implement the foundational "Load -> Split -> Embed -> Store -> Retrieve -> Generate" workflow.
* **Load:** Ingest the NVIDIA financial report PDF.
* **Split:** Break the document into smaller, manageable chunks.
* **Embed:** Convert each chunk into a numerical representation (vector).
* **Store:** Save these vectors in a specialized database for efficient searching.
* **Retrieve:** Given a user's query, find the most relevant chunks from the database.
* **Generate:** Pass the retrieved chunks and the original query to a Large Language Model (LLM) to generate a final answer.

### **Business Impact**
By testing this simple system against our stakeholder queries, we will see where it succeeds and, more importantly, where it fails. This provides a clear, data-driven justification for the more advanced techniques we will implement in later modules to improve accuracy and reliability.


In [None]:
### **Step 1: Install Dependencies**
First, we install all the necessary open-source libraries.


In [None]:
!pip install -q langchain langchain-community langchain-groq qdrant-client langchain_qdrant langchain_huggingface pypdf


In [None]:
### **Step 2: Set Up API Keys**
We need to configure our API keys for Groq (LLM) and Hugging Face (embeddings). Please add your keys to the Colab secrets manager (key icon on the left) with the names `GROQ_API_KEY` and `HF_TOKEN`.


In [None]:
import os
from google.colab import userdata

# Set up the API keys
os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY')
os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')

print("API keys set.")


In [None]:
### **Step 3: Load and Split the Document**
Here, we perform the first two stages of our RAG pipeline: **Load** and **Split**. We'll load the NVIDIA PDF and use a `RecursiveCharacterTextSplitter` to create text chunks that are small enough to be processed effectively.


In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load the PDF
# Make sure you have uploaded the NVIDIA Q1 FY25 earnings PDF to your Colab session
pdf_path = "./NVIDIA-Q1-FY26-Financial-Results.pdf"
loader = PyPDFLoader(pdf_path)
documents = loader.load()

# Split the document into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)

print(f"Successfully loaded and split the document into {len(docs)} chunks.")


In [None]:
### **Step 4: Embed and Store in Vector Database**
Now for the **Embed** and **Store** stages. We will use the powerful `bge-m3` model from Hugging Face to create vector embeddings for our chunks. These vectors will be stored in a **Qdrant** vector database running entirely in memory, which is perfect for our notebook environment.


In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_community.vectorstores import Qdrant
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
from langchain_core.documents import Document

# 1. Initialize our embedding model
model_name = "BAAI/bge-m3"
model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": True}

embedding_model = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
)

collection_name = "nvidia_earnings"

# 2. Create a Qdrant client for an in-memory database
client = QdrantClient(":memory:")

# 3. Create the collection in Qdrant if it does not exist
if not client.collection_exists(collection_name=collection_name):
    # The BAAI/bge-m3 model has a vector size of 1024
    client.create_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
    )
    print(f"Successfully created collection '{collection_name}'.")
else:
    print(f"Collection '{collection_name}' already exists.")

# 4. Instantiate the QdrantVectorStore with the existing client and collection
vectorstore = QdrantVectorStore(
    client=client,
    collection_name=collection_name,
    embedding=embedding_model,
)

# 5. Add the documents to the vector store
vectorstore.add_documents(docs)
print("Successfully embedded and stored documents in Qdrant.")


In [None]:
Vector databases like Qdrant are highly optimized to perform dot product calculations very, very quickly. By normalizing the embeddings on the client-side (inside the HuggingFaceEmbeddings class), we allow Qdrant to use its fastest possible calculation (dot product) to give us the results of a cosine similarity search. It's a standard and important performance optimization.


In [None]:
### **Step 5: Initialize the RAG Chain**
This is where we tie everything together. We'll set up our Groq LLM, create a retriever to fetch documents from Qdrant, and build the final RAG chain using LangChain Expression Language (LCEL).


In [None]:
from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Initialize the Groq LLM
llm = ChatGroq(temperature=0, model_name="meta-llama/llama-4-scout-17b-16e-instruct")

# Create the retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# Create the prompt template
prompt_template = """
Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(prompt_template)

# Build the RAG chain using LCEL
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Add a step to print the retrieved documents
def print_retrieved_docs(docs):
    print("Retrieved Documents:")
    for i, doc in enumerate(docs):
        print(f"--- Document {i+1} ---")
        print(doc.page_content)
        print(f"Source: {doc.metadata.get('source')}, Page: {doc.metadata.get('page')}")
        print("-" * 20)
    return docs

rag_chain_with_printing = (
    {"context": retriever | print_retrieved_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print("RAG chain initialized successfully.")


In [None]:
### **Step 6: Test the Baseline System**
It's time to evaluate our baseline system. We will ask the four critical stakeholder queries and see how it performs. This will reveal the strengths and weaknesses of a simple semantic-search-based RAG pipeline.


In [None]:
# Our stakeholder-driven test queries, derived from the Q1 FY26 report
queries = [
    # 1. For the Financial Analyst (requires nuanced understanding)
    #"What were the drivers of the year-over-year increase in Compute & Networking segment operating income for Q1 FY26?",

    # 2. For the Portfolio Manager (requires table data extraction)
    #"What was the Research and development expense for the three months ended April 27, 2025?",

    # 3. For the CIO/Risk Officer (requires specific fact retrieval)
    #"What was the total charge incurred in Q1 FY2026 related to H20 excess inventory and purchase obligations?",

    # 4. For the Retail Investor (requires specific fact retrieval)
    "How much did NVIDIA spend on share repurchases in the first quarter of fiscal year 2026?"
]

# Run the queries through our RAG chain
for query in queries:
    print(f"Query: {query}\n")
    answer = rag_chain_with_printing.invoke(query) # Use the chain with printing
    print(f"Answer: {answer}\n")
    print("-" * 50)


In [None]:
## **Module 1 Conclusion: Analyzing the Baseline's Surprising Successes and Critical Failures**

The real-world performance of our baseline system is fascinating and provides us with invaluable insights. We saw surprising successes alongside a critical failure that perfectly highlights the limitations of a naive RAG approach.

---

### Analysis of Results

- **The Successes (Analyst, Portfolio Manager & CIO Queries):**

  - **Analyst & CIO Queries**: The system performed exceptionally well, correctly answering the nuanced question about operating income drivers and the specific query about the $4.5 billion charge. This indicates that when the answer is contained in a clear, semantically distinct sentence, the basic retriever works effectively.

  - **The Brittle Success (Portfolio Manager's R&D Query)**: The system's ability to correctly pull the R&D expense ($3,989 million) from a table is a surprising success. However, this should be viewed as **unreliable and likely coincidental**. The simple `PyPDFLoader` does not truly understand table structures; it merely extracted the text in an order that, by chance, kept the "Research and development" string close enough to its value for the LLM to connect them. This approach is fragile and would fail with more complex tables or comparative queries.

- **The Critical Failure (Retail Investor's Share Repurchase Query):**

  - This is our most important learning point. The system confidently stated that the document "**does not contain information about share repurchases**", which is factually incorrect. The report explicitly states on page 28:  
    *"We repurchased 126 million… shares… for $14.5 billion… during the first quarter of fiscal years 2026".*

  - **Diagnosis**: This is a classic **Retrieval Failure**. The data was loaded and chunked, but the retriever failed to identify the chunk containing the answer as being semantically relevant to the query "How much did NVIDIA spend on share repurchases…".

  **The Reason: Why the Retriever Failed**

  The retriever prioritized chunks with high general semantic relevance over the chunk with the specific factual answer. The top 4 retrieved documents were all about NVIDIA's Q1 FY26 finances but lacked the keywords "share repurchases." The correct chunk, which was part of a table on page 6, was ranked lower and thus missed by the k=4 cutoff.

  **While it is tempting to simply increase k to 10 to get the answer**, this is not a robust solution for a production system.

  - It's Inefficient: Retrieving more documents significantly increases cost and latency (slower response time) for every query.

  - It Can Reduce Accuracy: LLMs suffer from a "lost in the middle" problem. Burying the single correct document among many irrelevant ones increases the chance the LLM will ignore it, potentially leading to more incorrect answers.

  - It Doesn't Fix the Root Cause: A production system cannot rely on luck. The core issue is an imprecise retriever. The goal is to improve retrieval quality to rank the best documents at the top, not just to retrieve more documents and hope for the best.

---

### Key Takeaway

This baseline RAG, while a useful starting point, is too unpredictable and brittle for production deployment. Its successes feel coincidental, while its failures are critical. The silent retrieval failure—confidently reporting that **information is absent when it's present—is a dangerous flaw that erodes user trust and demonstrates the system's fundamental lack of robustness.**

---

### Next Up:

The retrieval failure on the share repurchase query gives us a clear mission.  
We need to make our retriever more robust. **In Module 2**, we will directly address this by implementing **Hybrid Search**, which combines semantic search with keyword matching. This technique is specifically designed to prevent failures on queries containing precise terms like "share repurchases."

---
