# RAG in Action - Part 1: Naive RAG Demo

A complete demonstration of building a basic RAG system with Apple's 2023 10-K filing

## Step 1: Download Data

In [1]:
import sys
from pathlib import Path

# Add parent directory to path
sys.path.append(str(Path().absolute().parent))

from data.download_data import download_apple_10k

# Download Apple 10-K report
file_path = download_apple_10k()
print(f"Downloaded: {file_path}")

File already exists: /Users/kakao/Dev/personal/medium/RAG in Action/rag-in-action-series/part_01/data/apple_10k_2023.pdf
Downloaded: /Users/kakao/Dev/personal/medium/RAG in Action/rag-in-action-series/part_01/data/apple_10k_2023.pdf


## Step 2: Test LLM Limitations (Before RAG)

In [2]:
import ollama
import time

question = "What was Apple's total revenue in 2023? Please provide the exact number."

print("🚫 LLM WITHOUT RAG")
print("=" * 50)

start_time = time.time()
response = ollama.chat(
    model='llama3.1:8b',
    messages=[{'role': 'user', 'content': question}]
)
llm_time = time.time() - start_time

print(f"Question: {question}")
print(f"Answer: {response['message']['content']}")
print(f"Response time: {llm_time:.2f}s")
print(f"Source: ❌ None (relies on parameter memory only)")

llm_response = response['message']['content']

🚫 LLM WITHOUT RAG
Question: What was Apple's total revenue in 2023? Please provide the exact number.
Answer: I don't have information on Apple's revenue for 2023 as I'm a large language model, my training data only goes up to 2022 and does not include real-time updates or future financial data.

However, you can check Apple's official investor relations website (investors.apple.com) for the most recent financial reports. They release their annual reports around late January of each year.
Response time: 4.06s
Source: ❌ None (relies on parameter memory only)


## Step 3: Load and Process Documents

In [3]:
import torch
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Check device availability
device = 'mps' if torch.backends.mps.is_available() else 'cpu'
print(f"Using device: {device}")

# Load PDF
loader = PyPDFLoader(file_path)
documents = loader.load()
print(f"Loaded {len(documents)} pages")

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)
chunks = text_splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks")

Using device: mps
Loaded 80 pages
Created 358 chunks


## Step 4: Create Embeddings and Vector Store

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

print("📥 Loading embedding model...")
print("⏰ First run may take 1-2 minutes for model download")

# Initialize embeddings with Mac M1 optimization
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': device},
    encode_kwargs={'normalize_embeddings': True}
)

print("✅ Embedding model loaded!")
print("🔄 Creating vector store...")

# Create vector store
start_time = time.time()

vector_store = Qdrant.from_documents(
    chunks,
    embeddings,
    location=":memory:",
    collection_name="apple_10k"
)

embedding_time = time.time() - start_time

print(f"✅ Vector store created in {embedding_time:.2f}s")
print(f"📊 Stored {len(chunks)} document chunks")

## Step 5: Build RAG Pipeline

In [8]:
print("\n✅ RAG SYSTEM WITH CONTEXT")
print("=" * 50)

# Retrieve relevant documents
retriever = vector_store.as_retriever(search_kwargs={"k": 5})
docs = retriever.invoke(question)

print(f"🔍 Found {len(docs)} relevant documents")

# Combine retrieved context
context = "\n\n".join([doc.page_content for doc in docs])

# Create RAG prompt
prompt = f"""
Based on the following information from Apple's 2023 10-K filing, answer the question accurately:

Context:
{context}

Question: {question}

Answer: Please provide a specific answer based on the provided context.
"""

# Generate RAG response
start_time = time.time()
rag_response = ollama.chat(
    model='llama3.1:8b',
    messages=[{'role': 'user', 'content': prompt}]
)
rag_time = time.time() - start_time

print(f"Question: {question}")
print(f"Answer: {rag_response['message']['content']}")
print(f"Response time: {rag_time:.2f}s")
print(f"Source: ✅ {len(docs)} documents from 2023 10-K filing")


✅ RAG SYSTEM WITH CONTEXT
🔍 Found 5 relevant documents
Question: What was Apple's total revenue in 2023? Please provide the exact number.
Answer: According to the information provided, Apple's total net sales (revenue) for 2023 were $383.285 billion.
Response time: 3.44s
Source: ✅ 5 documents from 2023 10-K filing


## Step 6: Compare Results

In [None]:
print("\n📊 COMPARISON RESULTS")
print("=" * 60)

print("\n❌ LLM WITHOUT RAG:")
print(f"   {llm_response}")
print(f"   ⏱️  Response time: {llm_time:.2f}s")
print(f"   📄 Source: None")
print(f"   🎯 Reliability: Low (no verification possible)")

print("\n✅ LLM WITH RAG:")
print(f"   {rag_response['message']['content']}")
print(f"   ⏱️  Response time: {rag_time:.2f}s")
print(f"   📄 Source: {len(docs)} documents")
print(f"   🎯 Reliability: High (traceable sources)")

print("\n📚 RETRIEVED SOURCES:")
for i, doc in enumerate(docs[:3], 1):
    page_num = doc.metadata.get('page', 'Unknown')
    print(f"   Source {i} (Page {page_num}): {doc.page_content[:100]}...")

print("\n" + "="*60)
print("💡 Why did this difference occur?")
print("="*60)
print("""
Did you see that? The pure LLM without RAG avoided answering due to lack of recent information,
but the RAG-applied system accurately referenced the 2023 10-K report we provided and
generated answers with specific figures.

We can even trace the documents that served as the basis for the answers.
This is the power of RAG!
""")

## Step 7: Test Additional Questions

In [10]:
# Test additional questions
test_questions = [
    "What are Apple's main business segments?",
    "What was Apple's gross margin in 2023?",
    "What are the main risk factors for Apple?"
]

def ask_rag(question):
    docs = retriever.invoke(question)
    context = "\n\n".join([doc.page_content for doc in docs])
    
    prompt = f"""
Based on Apple's 2023 10-K filing:

{context}

Question: {question}
Answer:"""
    
    response = ollama.chat(
        model='llama3.1:8b',
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    return response['message']['content']

print("\n🔍 ADDITIONAL TEST QUESTIONS")
print("=" * 50)

print("\n✅ Success Cases:")
for i, question in enumerate(test_questions, 1):
    print(f"\n{i}. {question}")
    answer = ask_rag(question)
    print(f"   Answer: {answer[:150]}...")

# Demonstrate limitations
failure_question = "What was the market reaction to Apple Vision Pro's initial sales volume?"

print("\n\n❌ Limitation Demonstration:")
print(f"Question: {failure_question}")
failure_answer = ask_rag(failure_question)
print(f"Answer: {failure_answer}")

print("\n" + "="*60)
print("🚨 Did you see the limitations of Naive RAG?")
print("="*60)
print("""
The 2023 10-K report doesn't contain specific Vision Pro sales data.
Naive RAG shows limitations when:
- Keywords don't match exactly, or
- Asked about content not in the documents

In Part 2 and 3, we'll solve these problems by introducing:
🔹 Advanced Qdrant filtering
🔹 Hybrid search (semantic + keyword)  
🔹 Query rewriting techniques
""")


🔍 ADDITIONAL TEST QUESTIONS

✅ Success Cases:

1. What are Apple's main business segments?
   Answer: According to the text, Apple's main business segments are:

1. Products:
	* Smartphones (iPhone line)
	* Personal computers (Mac line)
	* Tablets
	* W...

2. What was Apple's gross margin in 2023?
   Answer: Unfortunately, the provided text does not mention Apple's gross margin for 2023.

However, we can infer that to find the gross margin, we would need t...

3. What are the main risk factors for Apple?
   Answer: Based on Apple's 2023 10-K filing, the main risk factors for Apple include:

1. **Quality and product reliability issues**: Failure to detect and fix ...


❌ Limitation Demonstration:
Question: What was the market reaction to Apple Vision Pro's initial sales volume?
Answer: There is no information in the provided text about Apple Vision Pro's initial sales volume or market reaction. The text does mention that Apple Vision Pro, a spatial computer featuring visionOS, "is expe

In [None]:
# Test additional questions
test_questions = [
    "What are Apple's main business segments?",
    "What was Apple's gross margin in 2023?",
    "What are the main risk factors for Apple?"
]

def ask_rag(question):
    docs = retriever.invoke(question)
    context = "\n\n".join([doc.page_content for doc in docs])
    
    prompt = f"""
Based on Apple's 2023 10-K filing:

{context}

Question: {question}
Answer:"""
    
    response = ollama.chat(
        model='llama3.1:8b',
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    return response['message']['content']

print("\n🔍 ADDITIONAL TEST QUESTIONS")
print("=" * 50)

print("\n✅ Success Cases:")
for i, question in enumerate(test_questions, 1):
    print(f"\n{i}. {question}")
    answer = ask_rag(question)
    print(f"   Answer: {answer[:150]}...")

# Demonstrate limitations
failure_question = "What was the market reaction to Apple Vision Pro's initial sales volume?"

print("\n\n❌ Limitation Demonstration:")
print(f"Question: {failure_question}")
failure_answer = ask_rag(failure_question)
print(f"Answer: {failure_answer}")

print("\n" + "="*60)
print("🚨 Did you see the limitations of Naive RAG?")
print("="*60)
print("""
The 2023 10-K report doesn't contain specific Vision Pro sales data.
Naive RAG shows limitations when:
- Keywords don't match exactly, or
- Asked about content not in the documents

In Part 2 and 3, we'll solve these problems by introducing:
🔹 Advanced Qdrant filtering
🔹 Hybrid search (semantic + keyword)  
🔹 Query rewriting techniques
""")

### Implementation Note:
We implemented RAG manually step-by-step for educational purposes. LangChain provides higher-level abstractions like `RetrievalQA` chains that can make this process more concise, but understanding the underlying mechanics helps you customize and debug your RAG systems more effectively.

## Summary

🎉 **You've successfully built your first RAG system!**

### What we accomplished:
- ✅ Downloaded and processed Apple's 2023 10-K filing
- ✅ Implemented document chunking strategy  
- ✅ Created embeddings using free, local models
- ✅ Set up Qdrant vector database
- ✅ Built a complete RAG pipeline
- ✅ Demonstrated clear improvements over vanilla LLM

### Key improvements with RAG:
1. **Factual accuracy** - Based on real documents
2. **Source traceability** - Can verify information
3. **Up-to-date information** - Uses latest 2023 data
4. **Reduced hallucination** - Grounded in provided context
5. **Domain expertise** - Specialized financial knowledge