
### **Module 1: The Basic RAG**

**Objective:**
In this module, we will build the simplest possible Retrieval-Augmented Generation (RAG) pipeline. This serves as our **performance baseline**. Our goal is to understand the fundamental workflow of a RAG system and to observe its inherent limitations when applied to a complex financial document.

**Core Concept: The Classic RAG Pipeline**
We will implement the foundational "Load -\> Split -\> Embed -\> Store -\> Retrieve -\> Generate" workflow to build our AI Financial Analyst.

### Learning Objectives

By the end of this module, you will be able to:

  * **Understand the end-to-end workflow** of a fundamental RAG system.
  * **Implement the six core stages:** Load, Split, Embed, Store, Retrieve, and Generate.
  * **Use LangChain** to build a pipeline connecting a vector database (Qdrant) to a powerful LLM (Llama 4).
  * **Perform semantic search** to retrieve document chunks based on conceptual meaning.
  * **Analyze the output** to identify the limitations and critical failure points of a basic RAG approach.

-----

#### **Step 1: Install Dependencies**

First, we install all the necessary open-source libraries.

In [None]:
!pip install -q langchain langchain-community langchain-groq qdrant-client sentence-transformers pypdf

-----

#### **Step 2: Set Up API Key & Document Loading**

This step configures our Groq API key and loads the NVIDIA PDF document, splitting it into manageable chunks.

In [None]:
import os
from google.colab import userdata

# Set up the API key from Colab secrets
os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY')

# --- Document Loading and Splitting ---
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Make sure you have uploaded the NVIDIA Q1 FY26 earnings PDF
pdf_path = "./NVIDIA-Q1-FY26-Financial-Results.pdf"
loader = PyPDFLoader(pdf_path)
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)

print(f"Document loaded and split into {len(docs)} chunks.")

-----

#### **Step 3: Embed and Store in Vector Database**

Now, we will create vector embeddings for our text chunks and store them in our Qdrant vector database.

In [None]:
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.vectorstores import Qdrant

# YOUR CODE HERE (Part 1)
# Initialize the HuggingFaceBgeEmbeddings model.
# Use the model name "BAAI/bge-m3".
# Set the model_kwargs to use the "cpu" and encode_kwargs to normalize embeddings.
embedding_model = ...

print("Embedding model initialized.")

# YOUR CODE HERE (Part 2)
# Use the Qdrant.from_documents() class method to create the vector store.
# This single command will create the embeddings for all 'docs' and store them.
# It needs the 'docs', the 'embedding_model', an in-memory 'location', and a 'collection_name'.
vectorstore = ...

print("Successfully embedded documents and stored them in Qdrant.")

-----

#### **Step 4: Initialize the RAG Chain**

This is where we tie everything together using LangChain Expression Language (LCEL) to create our final pipeline.

In [None]:
from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# YOUR CODE HERE (Part 1)
# Initialize the ChatGroq LLM with the 'llama3-8b-8192' model and a temperature of 0.
llm = ...

# Initialize a retriever from the vectorstore you created in the previous step.
# HINT: Use the .as_retriever() method on your vectorstore object.
retriever = ...

# Create the prompt template string.
# It should instruct the LLM to answer based *only* on the {context}, and it needs a {question} placeholder.
prompt_template_string = """
Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(prompt_template_string)


# YOUR CODE HERE (Part 2)
# Build the RAG chain using LCEL and the pipe operator (|).
# The first part should be a dictionary that prepares the context and question.
# Then, pipe it to the prompt, then the llm, and finally to the StrOutputParser.
rag_chain = (
    ...
    | prompt
    | llm
    | StrOutputParser()
)

print("RAG chain initialized successfully.")

-----

#### **Step 5: Test the Baseline System**

Now we run our stakeholder queries through the system to see how our baseline performs.

In [None]:
# Our stakeholder-driven test queries
queries = [
    "What were the drivers of the year-over-year increase in Compute & Networking segment operating income for Q1 FY26?",
    "What was the Research and development expense for the three months ended April 27, 2025?",
    "What was the total charge incurred in Q1 FY2026 related to H20 excess inventory and purchase obligations?",
    "How much did NVIDIA spend on share repurchases in the first quarter of fiscal year 2026?"
]

# Run the queries through our RAG chain
for query in queries:
    print(f"Query: {query}\n")
    answer = rag_chain.invoke(query)
    print(f"Answer: {answer}\n")
    print("-" * 50)

-----

#### **Module 1 Conclusion: Analyzing the Baseline Performance**

*After running the code, analyze the output. Did it answer all questions correctly? Where did it fail? The most likely failure will be on the "share repurchases" query. This is our key learning: basic semantic search is powerful but not always reliable, which sets the stage for our next module.*