# **AI-Powered Financial Analysis System - Project Prerequisites**

This document outlines the necessary setup to complete all modules of the RAG learning roadmap. Please configure these items before starting Module 1.

### **Required API Keys**

You will need to create free accounts for the following services and obtain API keys. We will load these into our Google Colab environment using the secrets manager.

1.  **Groq API Key**
    * **Purpose:** Provides access to high-speed LLM inference.
    * **Get it here:** [https://console.groq.com/keys](https://console.groq.com/keys)

2.  **Hugging Face User Access Token**
    * **Purpose:** Allows us to download models and use the Hugging Face ecosystem. A `read` role is sufficient.
    * **Get it here:** [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)

3.  **Tavily API Key**
    * **Purpose:** Required for web search capabilities in **Module 9**.
    * **Get it here:** [https://app.tavily.com](https://app.tavily.com)

### **Required Files & Environment**

1.  **Environment:** Google Colab.
2.  **Source Document:** The **NVIDIA Q1 FY26 Earnings Report** PDF.
    * **Action:** Download the press release PDF from the official NVIDIA news site: [NVIDIA Q1 FY2026 Financial Results](https://investor.nvidia.com/financial-info/quarterly-results/default.aspx).

    DIRECT PDF Download LINK -> https://s201.q4cdn.com/141608511/files/doc_financials/2026/q1/b6df1c5c-5cb6-4a41-9d28-dd1bcd34cc26.pdf
    * You will need to upload this file (`.pdf`) to your Colab session at the start of each module.

---

Now, here is the complete Google Colab notebook for our first module.

# **Module 1: The Foundation - Basic RAG**

### **Objective**
In this module, we will build the simplest possible Retrieval-Augmented Generation (RAG) pipeline. This serves as our **performance baseline**. Our goal is to understand the fundamental workflow of a RAG system and to observe its inherent limitations when applied to a complex financial document.

### **Core Concept: The Classic RAG Pipeline**
We will implement the foundational "Load -> Split -> Embed -> Store -> Retrieve -> Generate" workflow.
* **Load:** Ingest the NVIDIA financial report PDF.
* **Split:** Break the document into smaller, manageable chunks.
* **Embed:** Convert each chunk into a numerical representation (vector).
* **Store:** Save these vectors in a specialized database for efficient searching.
* **Retrieve:** Given a user's query, find the most relevant chunks from the database.
* **Generate:** Pass the retrieved chunks and the original query to a Large Language Model (LLM) to generate a final answer.

### **Business Impact**
By testing this simple system against our stakeholder queries, we will see where it succeeds and, more importantly, where it fails. This provides a clear, data-driven justification for the more advanced techniques we will implement in later modules to improve accuracy and reliability.

---
### **Step 1: Install Dependencies**
First, we install all the necessary open-source libraries.

In [None]:
!pip install -q langchain langchain-community langchain-groq qdrant-client sentence-transformers pypdf

---
### **Step 2: Set Up API Keys**
We need to configure our API keys for Groq (LLM) and Hugging Face (embeddings). Please add your keys to the Colab secrets manager (key icon on the left) with the names `GROQ_API_KEY` and `HF_TOKEN`.

In [None]:
import os
from google.colab import userdata

# Set up the API keys
os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY')
os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')

print("API keys set.")

API keys set.


---
### **Step 3: Load and Split the Document**
Here, we perform the first two stages of our RAG pipeline: **Load** and **Split**. We'll load the NVIDIA PDF and use a `RecursiveCharacterTextSplitter` to create text chunks that are small enough to be processed effectively.

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load the PDF
# Make sure you have uploaded the NVIDIA Q1 FY25 earnings PDF to your Colab session
pdf_path = "./NVIDIA-Q1-FY26-Financial-Results.pdf"
loader = PyPDFLoader(pdf_path)
documents = loader.load()

# Split the document into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)

print(f"Successfully loaded and split the document into {len(docs)} chunks.")

Successfully loaded and split the document into 191 chunks.


---
### **Step 4: Embed and Store in Vector Database**
Now for the **Embed** and **Store** stages. We will use the powerful `bge-m3` model from Hugging Face to create vector embeddings for our chunks. These vectors will be stored in a **Qdrant** vector database running entirely in memory, which is perfect for our notebook environment.

In [None]:
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.vectorstores import Qdrant

# Initialize our embedding model
model_name = "BAAI/bge-m3"
model_kwargs = {"device": "cpu"} # Use CPU for embedding, can be changed to "cuda" if GPU is available
encode_kwargs = {"normalize_embeddings": True}
embedding_model = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
)

# Store the embedded documents in a Qdrant vector store
vectorstore = Qdrant.from_documents(
    docs,
    embedding_model,
    location=":memory:",  # Create an in-memory Qdrant instance
    collection_name="nvidia_earnings",
)

print("Successfully embedded documents and stored them in Qdrant.")

  embedding_model = HuggingFaceBgeEmbeddings(
Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/123 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/15.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/54.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/687 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/444 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

Successfully embedded documents and stored them in Qdrant.


---
### **Step 5: Initialize the RAG Chain**
This is where we tie everything together. We'll set up our Groq LLM, create a retriever to fetch documents from Qdrant, and build the final RAG chain using LangChain Expression Language (LCEL).

In [None]:
from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Initialize the Groq LLM
llm = ChatGroq(temperature=0, model_name="meta-llama/llama-4-scout-17b-16e-instruct")

# Create the retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# Create the prompt template
prompt_template = """
Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(prompt_template)

# Build the RAG chain using LCEL
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print("RAG chain initialized successfully.")

RAG chain initialized successfully.


---
### **Step 6: Test the Baseline System**
It's time to evaluate our baseline system. We will ask the four critical stakeholder queries and see how it performs. This will reveal the strengths and weaknesses of a simple semantic-search-based RAG pipeline.

In [None]:
# Our stakeholder-driven test queries, derived from the Q1 FY26 report
queries = [
    # 1. For the Financial Analyst (requires nuanced understanding)
    "What were the drivers of the year-over-year increase in Compute & Networking segment operating income for Q1 FY26?",

    # 2. For the Portfolio Manager (requires table data extraction)
    "What was the Research and development expense for the three months ended April 27, 2025?",

    # 3. For the CIO/Risk Officer (requires specific fact retrieval)
    "What was the total charge incurred in Q1 FY2026 related to H20 excess inventory and purchase obligations?",

    # 4. For the Retail Investor (requires specific fact retrieval)
    "How much did NVIDIA spend on share repurchases in the first quarter of fiscal year 2026?"
]

# Run the queries through our RAG chain
for query in queries:
    print(f"Query: {query}\n")
    answer = rag_chain.invoke(query)
    print(f"Answer: {answer}\n")
    print("-" * 50)

Query: What were the drivers of the year-over-year increase in Compute & Networking segment operating income for Q1 FY26?

Answer: The year-over-year increase in Compute & Networking segment operating income in the first quarter of fiscal year 2026 was driven by growth in revenue, partially offset by a $4.5 billion charge associated with H20 excess inventory and purchase obligations.

--------------------------------------------------
Query: What was the Research and development expense for the three months ended April 27, 2025?

Answer: The Research and development expenses for the three months ended April 27, 2025 was $3,989 million.

--------------------------------------------------
Query: What was the total charge incurred in Q1 FY2026 related to H20 excess inventory and purchase obligations?

Answer: The total charge incurred in Q1 FY2026 related to H20 excess inventory and purchase obligations was $4.5 billion. This charge was incurred as a result of new export licensing require

## **Module 1 Conclusion: Analyzing the Baseline’s Surprising Successes and Critical Failures**

The real-world performance of our baseline system is fascinating and provides us with invaluable insights. We saw surprising successes alongside a critical failure that perfectly highlights the limitations of a naive RAG approach.

---

### Analysis of Results

- **The Successes (Analyst, Portfolio Manager & CIO Queries):**

  - **Analyst & CIO Queries**: The system performed exceptionally well, correctly answering the nuanced question about operating income drivers and the specific query about the $4.5 billion charge. This indicates that when the answer is contained in a clear, semantically distinct sentence, the basic retriever works effectively.

  - **The Brittle Success (Portfolio Manager’s R&D Query)**: The system’s ability to correctly pull the R&D expense ($3,989 million) from a table is a surprising success. However, this should be viewed as **unreliable and likely coincidental**. The simple `PyPDFLoader` does not truly understand table structures; it merely extracted the text in an order that, by chance, kept the "Research and development" string close enough to its value for the LLM to connect them. This approach is fragile and would fail with more complex tables or comparative queries.

- **The Critical Failure (Retail Investor’s Share Repurchase Query):**

  - This is our most important learning point. The system confidently stated that the document "**does not contain information about share repurchases**", which is factually incorrect. The report explicitly states on page 28:  
    *"We repurchased 126 million… shares… for $14.5 billion… during the first quarter of fiscal years 2026".*

  - **Diagnosis**: This is a classic **Retrieval Failure**. The data was loaded and chunked, but the retriever failed to identify the chunk containing the answer as being semantically relevant to the query "How much did NVIDIA spend on share repurchases…". Because the retriever passed an empty or irrelevant context to the LLM, the LLM correctly (but misleadingly) reported that the information was not available.

---

### Key Takeaway

Our baseline system is **unpredictable and not production-ready**. Its success feels more like luck than robust design. The silent retrieval failure is particularly dangerous, as it can mislead a user into believing information is absent when it is, in fact, present.  
This single failure demonstrates why we cannot deploy such a simple system.

---

### Next Up:

The retrieval failure on the share repurchase query gives us a clear mission.  
We need to make our retriever more robust. **In Module 2**, we will directly address this by implementing **Hybrid Search**, which combines semantic search with keyword matching. This technique is specifically designed to prevent failures on queries containing precise terms like "share repurchases."

---

